Using SIMPL as a tool to bootstrap unusual processors

I have written a lot about SIMPL over the years, but it is my conviction that it has uses as a tool to help bootstrap various novel processors – and to ease the early stages of processor code development.  It’s not a fully fledged interpreted language like Forth or BASIC but just enough to make writing code easier on unfamiliar processors.

SIMPL may be written in about 100 lines of C code, or ported directly to the native assembly language of the  target processor for even more compactness and speed, which is the approach I have used with the MSP430.

With this in mind, I will attempt to use SIMPL to provide an interactive coding environment for the 1949 EDSAC,  – or at least a simulated or an FPGA softcore version of it – as part of the Wuthering Bytes EDSAC Challenge  – to be held i Hebden Bridge in early September.

This is a work in progress – so don’t expect too much just yet.

 
SIMPL
SIMPL is a character interpreter running within a loop and as such, models a virtual machine:
NEXT:              Fetch next Character from buffer
                         Increment Program Counter
                         Decode to form unique Jump Address
                         Jump to code body, execute code – end with Jump back to NEXT
This forms the heart of a very simple virtual machine which is capable of executing unique code functions based on the character found in the memory buffer.  This elementary execution model is equivalent to most modern processor systems, so with a little bit of work, SIMPL can form the basis of a simulator for most basic processor architectures.
The character may be fetched from a serial terminal input buffer and executed immediately in an interactive manner, or the Instruction Pointer can point to a general area of program memory, and execute the character instructions found there.
It is this combination of using SIMPL either interactively at the terminal or as a means of entering and editing text into memory and subsequently running it from RAM, that makes SIMPL a powerful tool – to have in the programmer’s tool kit.
As a kernel, capable of executing the basic inner kernel loop,  SIMPL may be coded in about 300 bytes on most small processors such as the MSP430 – this includes the overhead of setting up the mcu, oscillator, UART and GPIO etc – so that it supports serial terminal interaction.
As more functions are added, the code size grows by approximately 16 bytes per function – so it’s possible to have a small but usable SIMPL running in about 1K bytes.
The users application code will further extend this by perhaps a similar amount – but it still has an extremely small footprint, making it an ideal serial command shell for resource limited processors.
SIMPL when coded in C has been ported to a number of microcontrollers including ATmega, ATtiny, ARM M4, and MSP430.  The kernel has also been rewritten in MSP430 assembly language for speed and compactness, and work is ongoing to port it to the J1 Forth mcu.
The ability to encode the basic inner interpreter virtual machine in the native assembly language of a range of processors makes this a powerful technique of getting the basis of a useful “monitor” programme onto a variety of processors.  It also means that once the kernel has been coded in the native assembly language, then the user is working with a common set of instructions running on the virtual machine, at a level somewhat higher than the native assembly.  This creates a common language instruction set – a kind of “lingua franca” that makes interaction with a wide range of processors much easier.  It also opens up the door for different processors to communicate with each other through this common character encoded language.
SIMPL has some of it’s roots in Forth, and indeed Charles Moore’s Forth language has been a source of inspiration for the development of SIMPL.  However SIMPL is not a fully fledged language, more like a multi-function tool (Leatherman) which just makes the interpreted interaction with a range of small processors a lot easier, and less painful than the usual edit-compile-run cycle of code development found within a modern C compiler IDE.
 
Assembler and Instruction Set Decoder.
At it’s heart, SIMPL, when the kernel is coded in C, uses techniques such as switch-case statements and look-up table, or hash table to create the code address associated with the input character.
This technique allows not just a jump address to be associated with the character, but also other data, such as instruction look-up – for extending an 8 bit character into a longer instruction length – with individually coded bit-fields. This is the method used in generating J1 assembly language instructions from the common SIMPL command set.  This is a form of microcoding and is a powerful technique used when writing assembly language for different target hosts.  The (expanded instructions decoded in SIMPL can be written to RAM, creating the object code for the new target.
Processor Simulation
SIMPL also has uses in the automation of cpu simpulation.  Any small cpu (with small instruction set) may be simulated in a few dozen lines of C code, and a processor model including memory, registers and ALU created.  SIMPL commands can be used to create assembly instructions to test this model,  single step through executable code runs and examine the results on a hex dump display that shows the memory contents and principal register contents.  A full screen of serial data can be updated in less than 0.1 seconds  – making single stepping more productive.
It is this processor simulation that I chose to explore next, and my target cpu will be one of the earliest processors – the 1949 EDSAC – built at Cambridge university.
Fortunately the EDSAC had very few instructions and is well documented.  On a less fortunate note, the machine uses an instruction set input via a 5 bit paper tape, which is somewhat different in order to the more familiar ascii character set, and uses “figure shift” and “letter shift” to access the various characters of the teleprinter keyboard. This means that serially typed ascii characters will have to go through a further layer of decode to turn them into the Murray coded characters as used on the Creed teleprinters of that time, and a further translation back again into ascii for output to a serial terminal.  However these  character translations can be done with simple look-up or switch-case statements.

 

 

 

Posted in Uncategorized | Leave a comment

The 1949 EDSAC Instruction Set and Architecture

The 1949 EDSAC
This information has been taken from the EDSAC explanatory poster produced by Cambridge university.  I have added some of my own notes for clarity.
The 1949 Instruction set EDSAC’s instructions in 1949 was very simple and were executed at a rate of about 600 per second. They were as follows:
AnS: A += m[n] AnL: AB += w[n]
SnS: A −= m[n] SnL: AB −= w[n]
HnS: R += m[n] HnL: RS += w[n]
VnS: AB += m[n] * R VnL: ABC += w[n] * RS
NnS: AB −= m[n] * R NnL: ABC −= w[n] * RS
TnS: m[n] = A; ABC = 0 TnL: w[n] = AB; ABC = 0
UnS: m[n] = A UnL: w[n] = AB CnS: AB += m[n] & R
CnL: ABC += w[n] & RS
RnS, RnL: Shift ABC right arithmetically by the number of places corresponding to the position of the least significant one in the shift instruction. For example, R0L, R1S, R16S and R0S shift by 1, 2, 6 and 15 places, respectively.
LnS, LnL: Shift ABC left arithmetically by the number of places corresponding to the position of the least significant one in the shift instruction. For example, L0L, L1S, L16S, L64S and L0S shift by 1, 2, 6, 8 and 13 places, respectively.
EnS: if A >= 0 goto n
GnS: if A < 0 goto n
InS: Place the next paper tape character in the least significant 5 bits of m[n].
OnS: Output the character in the most significant 5 bits of m[n].
FnS: Verify the last character output.
XnS: No operation.
YnS: Add a one to bit position 35 of ABC, counting the sign bit as bit zero. This effectively rounds ABC up to 34 fractional bits.
ZnS: Stop the machine and ring a bell.
The EDSAC Instruction Set (1949)
What caught my attention is that the instructions use a single character mnemonic – taken directly from the upper case characters available at that time from the 5-bit “Murray” teleprinter code – which was in common use in the Creed teleprinters used at that time.
The mnemonics were chosen to give a reasonable amount of readability such as A for ADD and S for SUBtract.
This makes common sense to make it as easy as possible and is a simple first stage mapping of human readable mnemonic characters to a 5 bit binary teleprinter code, which was used directly as the 5 bit instruction.
There were only two register spaces accessible from the instruction – visible to the user:
The 71-bit Accumulator
The 35-bit Multiplier Register
Further more there were 1024 18-bit words of delay line storage
The contents of the Order Register (Instruction Register) and Sequence Control Tank (Program Counter) as well as the multiplicand register were viewable on the operator’s console.
For my own clarification – so as to build up a simple model of the EDSAC, I have attempted to group the instructions into:
Accumulator and memory only
A n Add the number in storage location n into the accumulator
S n Subtract the number in storage location n from the accumulator
T n Transfer the contents of the accumulator to storage location n and clear the accumulator
U n Transfer the contents of the accumulator to storage location n and do not clear the accumulator
Accumulator, Multiplier and Memory
H n Copy the number in storage location n into the multiplier register
V n Multiply the number in storage location n by the number in the multiplier register and add the product into the accumulator
N n Multiply the number in storage location n by the number in the multiplier register and subtract the product from the accumulator
C n Collate [logical and] the number in storage location n with the number in the multiplier register and add the result into the accumulator
Accumulator and Shifter Operations
R 2n-2 Shift the number in the accumulator n places to the right
L 2n-2 Shift the number in the accumulator n places to the left
Conditional Jumps
E n If the sign of the accumulator is positive, jump to location n; otherwise proceed serially
G n If the sign of the accumulator is negative, jump to location n; otherwise proceed serially
Input-Output
I n Read the next character from paper tape, and store it as the least significant 5 bits of location n
O n Print the character represented by the most significant 5 bits of storage location n
F n Read the last character output for verification
Misc Operations
X No operation
Y Round the number in the accumulator to 34 bits
Z Stop the machine and ring the warning bell
The numerical values in the accumulator and multiplier registers are normally thought of as signed binary fractions, but integer operations could also be done easily. For example, the order V1S can be interpreted as adding the product of the 17-bit signed integer in m[1] and to the 17-bit integer in RS and adding the result into bits 0 to 32 of the ABC. With a suitable shift, the integer result can be placed in the senior 17 bits of A ready for storing in memory
Teleprinter and perforated tape for Input Output
EDSAC used 5-bit integers (0 to 31) to represent characters using two shifts: letters and figures.
In letter shift the codes 0 to 31 respectively represented: P, Q, W, E, R, T, Y, U, I, O, J, figs, S, Z, K, lets, null, F, cr, D, sp, H, N, M, lf, L, X, G, A, B, C and V.
In figure shift the encoding was as follows: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ?, figs, “, +, (, lets, null, $, cr, ;, sp, £, ,, ., lf, ), /, #, −, ?, : and =.
In these tables, figs, cr, sp and lf denote figure shift, carriage return, space and line feed, and on the paper tape perforator their keys were labelled π, θ, φ and ∆, respectively.
In this document, these codes correspond to the ASCII characters #, @, ! and &.
The paper tape reader complemented the high order bit of each 5-bit character, so the rows , and are read as codes 0(P), 7(U) and 27(G), respectively. The machine could read paper tape at a rate of 50 characters per second and output to a Creed teleprinter at nearly 7 characters per second.
The 5-bit Murray code used by the paper tape perforator and teleprinter was somewhat different to the familiar ascii character set.  I have tried to sumarise it here
Table 2 Edsac Character and Instruction Codes

Letter Shift  Figure Shift  Binary  Decimal   Instruction
P             0             00000      0      PLACE
Q             1             00001      1
W             2             00010      2
E             3             00011      3      JUMP IF EQUAL
R             4             00100      4      RIGHT SHIFT
T             5             00101      5      TRANSFER & CLEAR
Y             6             00110      6      ROUND
U             7             00111      7      UPDATE & !CLEAR
I             8             01000      8      INPUT
O             9             01001      9      OUTPUT
J                           01010      10
#             FS            01011      11
S             "             01100      12     SUBTRACT
Z             +             01101      13     STOP
K             (             01110      14
DEL           LS            01111      15
NULL          .             10000      16
F             $             10001      17     VERIFY
@             CR            10010      18
D             ;             10011      19
!             SP            10100      20
H             £             10101      21     COPY TO MULT
N             ,             10110      22     NEGATE THROUGH MULT
M             .             10111      23
&             LF            11000      24
L             )             11001      25
X             /             11010      26     NO OP
G             #             11011      27     JUMP IF GREATER
A             -             11100      28     ADD
B             ?             11101      29
C             :             11110      30     COLLATE (AND)
V             =             11111      31     ADD THROUGH MULT

Notes

1 Erase (DEL) is represented by an asterisk (“*”) in the simulator. When this character is output, it sets the
teleprinter into letter shift.
2 Blank tape is represented by a period (“.”). This character has no effect on output.
3 The personal computer text environment has only a “newline” character. On the Edsac simulator,
the line-feed character is interpreted as a newline character, and carriage returns are thrown away.
4 The symbols ?, f, ? or p are typed as @, !, & and #, respectively.

Some capital alphas unused B,D,J,K,M,Q,W
Posted in Uncategorized | Leave a comment

An EDSAC Simulator using SIMPL

In September 2017, in Hebden Bridge – as part of the Wuthering Bytes Digital Festival, will host the EDSAC Challenge, in which we will attempt to create the 1949 Cambridge EDSAC computer  as a softcore processor using an FPGA programmed using the IceStorm open source tool chain.
EDSAC was a simple architecture with just 18 instructions and used an 18-bit word length. It could address up to 1024 words of mercury delay line memory and had an instruction cycle time of about 1.5mS but represented a 1000 fold speed up in calculation times compared to contemporary hand cranked mechanical desk calculators.
EDSAC has not only historical significance, as it defined the start of the British Computer Industry, but it was also the first general purpose machine designed for real everyday scientific computation.
It’s simple architecture is relatively easy to study and there is a simulator program that runs on any current platform – thus it is a good introduction to Von Neuman “load-store” computer architectures.
My interest lies in the fact that it has so few instructions – and thus could be termed a minimal instruction set computer or MISC, which is a current fascination of mine.
My contribution to the ESAC Challenge, is that I wish to port my SIMPL language toolkit to the 1024 words of EDSAC memory, to see whether it can run on such a minimal architecture, and provide an interactive computing environment.
With just over 4 months available before the EDSAC Challenge – it’s time to get started.
The EDSAC Challenge is to be held at Hebden Bridge Town Hall on 7th and 8th September 2017 – immediately following the 2 day Chip Hack FPGA course.  These events are all parts of the Wuthering Bytes Digital Festival that runs from Friday 1st September to Sunday 10th September inclusive.
The next post looks at the EDSAC instruction set – and some of the challenges presented by working with 1940’s teleprinter code sets.
Posted in Uncategorized | Leave a comment

Exploring the J1 Instruction Set and Architecture using SIMPL

James Bowman developed the J1 Forth CPU in 2010 – and presented a paper describing its architecture at Euroforth  that same year.  It’s a stack based processor – capable of executing Forth primitives mostly in a single cycle. It’s also an ideal candidate to explore stack machine and MISC architectures – and has been ported to FPGAs using entirely open source tool chains.

This ongoing project was inspired by ideas first discovered in the book “The Elements of Computing Systems”  – and the associated course “From NAND to Tetris”.   Having ploughed easily through the first chapter on hardware, I realised that my software skills were dreadfully lacking – so I thought I’d create a simple language  (SIMPL) from scratch and get it to run on an open source FPGA platform  – myStorm – using Clifford Wolf’s Project IceStorm open source tool chain.

In this post I explore the J1 ISA (Instruction set & architecture) and hack together a simple cross-compiler to execute code on a simulated J1 processor.

 The J1 Architecture.

Having read James Bowman’s J1 Paper several times over, I managed to build up a simplified model of his instruction set and architecture.

J1 uses 16 bit long instruction words – where each word is divided into different length fields. The following 3 images – taken from James’s 2010 EuroForth paper – explains the architecture:

J1_ISAJ1_encodingJ1 ALU Codes

Bits 15, 14 and 13  define the Instruction Class – and there are 5 classes of instruction

1 x x   Literal

000   Jump

001    Conditional Jump

010    Call

011     ALU instruction

Bit 12 if  set in ALU mode provides the return from subroutine mechanism by loading the top of the return address stack R into the PC.

Bits 11,10,9 and 8  define a 4 bit ALU opcode – allowing up to 16 arithmetical, logical and memory transfer instructions.

Bits 7,6,5 and 4 are used to control the data multiplexers – so that data can be routed around the cpu according to which of these bits are set.  Here lies a little anomaly with the J1, in that Bit 4 is not used, and it would seem more logical to use it to provide the return function bit – currently done in bit 12.

Bits 3 and 2 define how the return stack pointer is manipulated in an instruction. It can be incremented or decremented by setting these bits.

Bits 1 and 0 define the manipulation of the data stack pointer  – it has a range of  +1,0,-1,-2 depending on the setting of these bits. To pop off the stack you subtract 1 from the stack pointer, to push on the stack, you add one to the stack pointer.

As the various control fields of the J1 instruction exercise different parts of hardware – they can operate in parallel – so for example a return or exit from subroutine can be had for free.

Modifying the J1 Instruction Set.

Whilst the J1 instruction set is neat and compact – the anomaly in Bit 4 is a bit of a sticking point with me.  If we put the “return bit” into the Bit 4 field, this would free up the Bit 12 field.  Bit 12 could then be used to define more classes of instruction, or better still added to the 4 bit ALU instruction field, to make the number of op-codes 32 rather than 16.

If there were two J1 cores acting together in a single FPGA – bit 12 could be used to channel instructions to either core. One J1 could be used for computation and the other could be used as a graphics processor or similar specialised co-processor.

James has already done some changes in this area – moving the return bit to Bit 7 – plus some other internal improvements – and has subsequently released a J1b – which can be instantiated as either a 16 bit or 32 bit Forth cpu, whilst the J1a is a minimum footprint cut down 16-bit J1 with *K memory to fit into small FPGAs – like the Lattice IceStick . See James’s Github for demo. The Verilog fr the new J1b  – is here

For more information about possible modifications to the J1 – read Victor Yurkovsky’s excellent post in “FPGA Related”.  Some neat ideas for those that want to tinker with the internals of the J1.  As the J1 original design verilog is so neat and clean, it’s easy to make small incremental changes and test them out for usefulness on the J1 simulator. Victor’s spring clean freed up two bits of the instruction for future expansion and reduced the number of CPU slices from 150 to 76 – allowing the scope for multiple J1 instantiations in a single small FPGA.

 

Creating a Simulator and SIMPL to J1 Cross Compiler

I came across a very compact simulator for James Bowman’s J1 cpu – written in about 100 lines of C. I decided to port this across to an ARM based Arduino Due, so I could use it to explore the architecture further, as a step on the road to implementing a J1 on a myStorrm FPGA board.

I have put the Arduino code for this SIMPL cross-compiler/ J1 Simulator on this Github Gist

As an input the simulator expects 16 bit J1 machine code instructions in an array of RAM, and the bulk of the code is an execution model of the cpu which steps through this section of memory executing each instruction in turn. After each program step I chose to print out the main stack elements, the program counter, and a few memory cells.

However, the process of hand assembling J1 machine language needs patience and knowledge of the chip’s architecture,  and to write half a dozen native instructions to execute a simple loop counter was about as much as I would recommend for anyone. So I decided to investigate easier ways to automate this process, and hit upon the idea of a cross-compiler, using my SIMPL language and text interpreter framework to input the source code.

SIMPL uses single ascii character tokens to represent primitive machine instructions – I just needed to create a look-up table mechanism where these characters are decode into whatever the machine language of the host processor happens to be – in this case J1.  However it could be any other cpu – just by changing the contents of the look-up table, so a series of cross compilers could be written so that SIMPL could be hosted virtually (no pun) on any processor.

SIMPL exists as a virtual, stack oriented processor, hosted on real hardware cpu. More often the host processor is register based – such as ARM or MSP430 – but in the case of the J1 it is also a stack machine (albeit simulated on a register based ARM) . This means that the choice of the SIMPL minimal primitive instructions – are a very good fit to actual J1 instructions – with often a 1 to 1 match.  This makes a real J1 an excellent host for the SIMPL language – and it is this relationship that I wish to explore further.

In previous posts I have described SIMPL as a lingua franca – which allows the communication of programs and data between widely varying classes of computing machines.  Each machine just needs to host a SIMPL interpreter, and at the start of a communication exchange, the communicating machine needs to send a descriptive list of what the various words mean – in terms of the primitives that go up to make their definitions. Once this has been done, SIMPL becomes a very compact medium for transfering code, data and objects – with a code compaction of between 3 and 4 over other source code or object code techniques.

In Practice

I took the SIMPL text interpreter framework and mapped into it the J1 instructions, focusing initially on the stack, ALU and memory operations – listed below

SIMPL               Operation                J1 hex code

”                           DUP                                6081

‘                           DROP                              6183

$                         SWAP                              6180

%                        OVER                              6181

+                         ADD                                 6203

&                         AND                                6303

|                           OR                                   6403

^                          XOR                               6503

~                          INV                                6600

@                         FETCH                         6C00

!                            STORE                         6123

As can be seen from the above – they all begin 6xxx which means ALU operation – and the 2nd significant digit  is the 4 bit ALU opcode. The 3rd digit is the transfer destination of the ALU result, and the least significant digit is the stack manipulation bits.

The cross compiler produces a line of output thus – in response to a single ascii character being typed in  – in this case ^ for XOR

XOR 6503 PC=9 TOP=FFFF DS0=0 DS1=0 DS2=0 DS3=0 RTN=0 MEM20=0 MEM21=6F00 MEM22=0

This also shows the Program Counter PC, the Top of Stack plus the first three elements of the stack below TOP,  the value on the top of the return stack and three consecutive locations in memory.  This debug output is essential to ensure that the J1 simulator is correctly executing each new instruction – and a great visual insight into the working of the CPU.

The cross compiler is now producing output in J1 machine language – and because it uses the SIMPL front end text interpreter, this automates the process of putting text strings together so it can be used to compose small snippets of  J1 code and run them in an interpreted manner – allowing new words to be created from SIMPL primitives.

Future work will enhance the cross compiler allowing the larger program flow constructs such as DO-LOOP, IF-THEN and BEGIN-END  to be implemented. Whilst output from the simulator is currently just single lines of Debug – the program will be extended so as to compile into files of J1 opcodes, and SIMPL source.

 

“It’s Forth-like, Jim – but not as we know it….”

 

 

 

 

 

Posted in Uncategorized | Leave a comment

Making it Minimal – MISC Machines

As I read more about Forth, and engage with some of the World’s Forth-Thinkers, I get the impression that at some point Forth becomes addictive and productive.  I have yet to reach that level of enlightenment, but my journey continues  slowly, in fits and starts, as there  is always some other more pressing engagement on my metal capacity.

I have been for some time a proponent of minimal instruction set (MISC) machines – and Chuck Moore has been prolific in this area for the last 35 years or so.  In a progressive career in creating silicon integrated circuits that execute Forth primitives directly, he has slowly whittled away the complexity of these machines, to the point where 144 individual cores can be put on a single die – as part of his Green Arrays work.

The problem then becomes not so much a hardware challenge – but then one of software just how do you efficiently co-ordinate the tasks of 144 cores. Perhaps we have something to learn from the behaviour of bees or other colonial insects including ants and termites?

Each worker has a specific task to perform be it foraging for food, nursery duty or general housekeeping.  Each agent is running a low level behavioural model, with a higher level specialisation – based on the specific task or in response to certain stmuli.

This might be one way to organise a multicore processor  – a set of low level functions which are augmented by task or sensor specific application functions.

For the moment however, I first have to define the Instruction Set and Architecture (ISA) of my proposed Stack CPU.

Charles Moore, Chen Hanson Ting and Bill Meunch provided an important framework for MISC machines, and proved them to be an important family of cpus.  The small instruction set means simplified hardware, fewer transistors or logic cells and the ability to implement  using low cost FPGA technology.

With a minimal instruction set,  and a Forthlike language, it’s always possible to create more complex instructions by concatenating the primitive instructions.  Provided that you have a small number of useful instructions – any program can be synthesised.

Moore and Ting postulated that the minimum useful instruction set would be about 25 to 30 instructions, and this fits in well with using 5 bit tokens to represent instructions. Moore packed four, 5 bit instructions into 21 bit wide memory – and this formed a  very simple pipeline architecture, which could also be used to hold a 16 bit literal and a 5 bit instruction. This scheme was used in Moore’s ShBoom processor and also the Mup21.

The Novix NC4016 however used a different approach and used the bit fields of the instruction to directly control different parts of the hardware.

This was the same approach taken by James Bowman – with his J1 Forth CPU.  This is fully described in just 200 lines of verilog and been implemented on Xilinx and Lattice FPGA devices.

James’s device was influenced my Charles Moore’s Novix NC4016 architecture, and has been successfully incorporated into a retro gaming sheild for Arduino – Gameduino.  James is also a professional Silicon Architect, and the J1 is also used as a core on some graphic processor chips.

Using a microcode ROM it would be possible to decode 5 bit instruction tokens into multiple field wider instructions – and this would allow the simple pipelining scheme to be utilised.

The Virtual Instruction Set

Here’s a non-exhaustive list of instructions that would be useful candidates for a Stack oriented MISC computer. Most of these could execute in a single cycle on a FPGA based processor as run-time code,  but some of the program flow constructs would need to be compiled.

Stack Manipulation

DUP
DROP
OVER
SWAP
PUSH
POP
NOP
Arithmetic
ADD
SUB
MUL
DIV
Logical
AND
OR
XOR
INV
Memory & Register Transfer
FETCH
STORE
>R
R>
I/O
IN
OUT
PRINTNUM
KEY
EMIT
Program Flow
GT
EQ
LT
IF
THEN
DO
LOOP
CALL
RETURN
JMP
JZ

In total about 35 instructions,  but some savings could be made bringing it down to 32, which allows 5 bit tokens and packing of instructions into wider memory words

 This maps well to the 33 printable ascii symbols  (not alpha or numbers).  These are chosen for their strong mnemonic bias  – to form the basis of the SIMPL language.  With only 32 symbols to learn its like a shorthand version of Forth yet retaining its human readability.  With a compiler, there is no reason why the longhand version of the instruction cannot be included in the listing. It also reduces the source code from an average of  4 characters (including space) per instruction to just 1 character per instruction. This compaction of source code means that snippets of executable code can be transferred between processors in small packets – using wireless connectivity or even as SMS messages over the GSM network.

In the next post I’ll cover some aspects of mapping this minimal instruction set onto the J1 architecture, plus a look at some of the cross-compiler and simulator tools needed to allow further exploration of the MISC class of computing machines.

 

 

 

 

 

 

 

 

 

Posted in Uncategorized | Leave a comment

Making progress with SIMPL

If you have read any of my last few posts you will know that I am working on a tiny language called SIMPL which was inspired by Forth.

SIMPL stands for Serial Interpreted Minimalist Programming Language – as that’s what it started out in life as being – a serial command shell for microcontrollers, when a few simple single alpha character commands could be used to run simple programs and loops on microcontrollers such as Arduino and Teensy.

SIMPL was adapted from Ward Cunningham’s Txtzyme – a character interpreter running inside a loop, and over a period of a few years I have augmented the language to provide  more functions such as maths operators, logic operators, loops and conditional branching, and perhaps most importantly, and borrowed from Forth –  the means to compile new words using the “colon definition”.

SIMPL is a lightweight language, it does not try to even attempt to do all the things that Forth can do, and with a word set that has been reduced to a maximum of 85 unique functions, it is much smaller than amost all implementations of Forth.

In just 300 bytes, the heart of the SIMPL kernel can take serial text – either typed in manually or sent as a file from a serial terminal program, and buffer it into RAM memory. It can also take colon definitions, and based on their identifying character – normally a capital letter, store them at precalculated addresses in RAM so that they can be accessed and run later with the minimum of decoding.

This unique feature of SIMPL effectively removes the overhead of the dictionary search in Forth.  A character read from the input buffer is decoded within a few cpu cycles, and program execution is directed to that block of code.

Primitives.

In addition to the engine that drives the inner interpreter, there are a series of low level primitives that define the SIMPL kernel.  These primitive words are effectively the minimum wordset from which the rest of the language can be written. Bill Meunch, C.H. Ting and Chuck Moore all arrived at roughly the same conclusion – back in the early 1990s  – that there was a small subset of words from which a full implementation of Forth could be synthesised. It is in this spirit that SIMPL has been written.

These primitives include stack operators, math and logic operators, memory access operators and those that allow conditional branching and control program flow.

There are just 32 primitives – which allows for a 5 bit instruction scheme.  These primitives mostly consist of symbols taken from  ascii characters  0x20 to 0x40.

The primitives have been chosen to offer register compatibility with Ting’s MSP430 eForth.  This offers a proven route to extension and makes building and testing the code a lot simpler.

At it’s lowest level, SIMPL takes the place of a serial bootloader allowing source code to be loaded into the program space of the target microcontroller, using nothing more sophisticated than a serial terminal program such as terraterm.

SIMPL is written in MSP430 assembly language for efficiency, speed and compactness, and it has strong influences from Dr CH Ting’s eForth for MSP430.

As I read through Ting’s book  Zen and the Forth Language – I realised that it should be possible to make the kernel of SIMPL a cut down version of eForth, that resides in just 1k bytes of program space, and presents a programmer’s “tookit” in addition to the means to load source code into the device and run that code on a virtual machine.

Progress to Date

Inspired by CH Ting’s book on the MSP430 eForth,  I decided to restructure my rudimentary code to make it compatibe with his register model.  This will allow for easier expansion later on.

Jump Table

 I have included a “jump table” to jump to the code addresses of all of the 96 possible ascii character instructions. This framework will allow me to quickly add the other functions and run them from the code addresses stored in the jump table.

Whist this jump table in itself is 192 bytes long, just over one quarter of the code so far, it reduces other overheads of instruction decoding. The jump table structure is not going to be as fast as direct threaded code, and this is the price we pay for using a tokenised instruction set – but it is easier to understand and extend with more functionality. In hardware, this jump table would be analagous to a microcode ROM for instruction decoding.

The jump table accepts an index value  in the form of a single ascii character. This is first copied to a temporary register as we need the original character later. Then we subtract 32 from this  – to remove the offset from zero – as the first printable character is “space” which is ascii 32. Finally we multiply whats left by 2 (by adding it to itself) and add this to the program counter.  This causes the pc to be incremented by the correct number of words into the table where we find the code address of the function we wish to run. A snippet of this below shows how this selects from the first 5 instructions

; Now we need to decode the instructions using a jump table
; Jump table uses 2 bytes per instruction - so 2 x 96 = 192 bytes

next:     mov.b @ip+,R12     ; Get the next character from the instruction memory
          mov.w R12,R13      ; Copy into R13 - as needed to decode Jump Address
          sub.w #0x0020,R13  ; subtract 32 to remove offset of space
          add.w R13,R13      ; double it for word addressing
          add.w R13,pc       ; jump to table entry

JumpTbl:  jmp space          ; SP
          jmp store          ; !
          jmp dup            ; "
          jmp lit            ; #
          jmp swap           ; $

 

In this way all of the 96 possible ascii tokens are handled, including numbers – which have their own decoding routine, also accessed via this jump table.

 

Primitive Progress

Most of Ting’s set of eForth primitives have been coded in now – and I can do simple arithmetic, logical operations and memory 16 bit fetch and store.

The code to enter and decode decimal numbers and print them out to the terminal is also in place and working. – this allows me to type in 123 456+. and the machine responds 00579, as I have yet to suppress leading zeroes on my printnum routine.

I have tested memory fetch and store, and also the conditional operators < = and >.

With the conditional operators returning either 0 for false or 1 for true, the next thing to implement will be the simple looping structures, and the mechanism that allows conditional execution of code.

These structures make use of parentheses ( ), and a single parameter on the stack which is passed into the loop counter

Any code to be conditionally executed is paced between the parentheses.  The loop counter defines how many times the code within the brackets should be executed –

0   not at all  – skip everything between the brackets until the end ) is detected

1   execute once – i.e use for conditional execution

n  execute n times – decrementing the loop counter until it is zero

This is a useful structure, as as well as conditional execution and DO-LOOPs  it can also be used for comments – by setting the counter to zero, anything between the brackets is not executed – so the text characters can be treated as a comment.

I am fairly confident that I can keep the kernel below 1024 bytes, with the higher level routines built on top of this 1K kernel. The codesize is now 738 bytes.

I have created a github gist to allow you to view the code so far – as this is clearly a work in progress.

 

 

 

 

 

 

Posted in Uncategorized | Leave a comment

More Thoughts SIMPL and eForth

 

Image result for CH ting zen

I recently downloaded Dr C.H. Ting’s book via Kindle “Zen and the Forth Language” – published by fellow Forther, Juergen Pintaske.

In the book, Ting describes in detail his eForth model ported to the MSP430, and how a complete direct threaded Forth can be built using a limited set of primitives.

Ting’s book is very comprehensive, giving details of how to use TI’s Code Composer to assemble a program and create an output file in a form that can be used by the 4E4TH IDE.  There is sufficient information in his book for any curious person to discover the inner workings of Forth and have a complete working system on the inexpensive MSP430 Launchpad.

Whist at Forth Day, I had the opportunity to talk with Ting and this was the motivation I needed to take the plunge with MSP430 assembly language. Ting’s book, and it’s detail of the eForth mode convinced me that I was on the right lines with my approach to SIMPL – indeed almost to the point where my implementation of SIMPL could be considered to be a sub-set of Ting’s eForth.

Having got the basics of the SIMPL interpreter running in assembly language, the plan is now to massage the allocation of the registers, so as to make it compatible with Ting’s model. Then SIMPL really will be a simplified version of eForth.  This means that if I ever choose to extend SIMPL at a later date – there is a clear and proven route via Ting’s eForth.

SIMPL removes the dictionary search overhead that otherwise exists in Forth. The “words” are just singe ASCII characters, and it is easy to jump to an address, calculated from that value. Whilst this might appear wasteful of memory, the ultimate  MSP430 target has 256Kbytes of FRAM to play with.  It massively reduces the overhead of decoding text strings.

Can we switch between SIMPL and Forth?

SIMPL source code  can be written in such a way that it can be expanded into standard Forth, using the SIMPL interpreter to automatically generated this more verbose form.  The SIMPL primitives can all be expanded out to their conventional Forth names, as can the internal words associated with letters a-z and user words A-Z.

Conversely, small eForth application programs could be analysed for their word content and mapped onto the primitives and user symbols used in SIMPL.  Provided that only 52 user defined words have been used in writing the application – then the switch between Forth and SIMPL should be relatively straightforward.

SIMPL is essentially a shorthand or shortform of Forth. The two should be designed in such a way that they are interchangeable. SIMPL is just a subset of Forth – simplified!

 

MUP21  & eForth Primitives 

From Ting, Dr. Chen-Hanson. Zen and the Forth Language: eFORTH for the MSP430 from Texas Instruments (Kindle Locations 406-413). Juergen Pintaske. Kindle Edition.

Here’s a list of the instruction set of Chuck Moore’s MUP21 Forth cpu. Ting compared his 31 eForth primitives with these and this confirmed that they were thinking along very similar lines.

Transfer Instructions JMP, JZ, JC, CALL, RET, LOOP

Math/ Logic Instructions AND, OR, XOR, NOT, SHR, ADDC

Memory Instructions LDR, STR, LIT

Stack Instructions PUSH, POP, DUP, DROP, SWAP, OVER, NOP

These are also very similar to the instruction set of James Bowman’s J1 Forth processor, which is the intended FPGA target for my SIMPL language. So my plan is to have a maximum of 32 primitives – which can be represented by a 5 bit instruction, and three of these instructions can be pipelined into a 16 bit wide memory – again an idea borrowed from Chuck’s MUP21.

However, the J1 uses a 16 bit wide instruction, where the bit fields directly control the hardware. In my proposal, there will need to be a further level of decode – effectively a 5 bit to 16 bit ROM, which generates the J1 instructions from the 5 bit instruction token. This is the price we pay for having a token set that has been chosen for human readability – it needs a further level of decoding to suit the ISA of the Forth processor.

We can of course model the instruction set and architecture  of the J1 in MSP430 assembly language, and this might be a useful step to do.  In effect we will have created a cross assembler, where SIMPL code is assembled into J1 instructions. Whilst slower than running SIMPL directly in MSP430 assembler, it would be useful to prove the development of the language on the J1 – all be it simulated in code. When run directly on the J1, there will be an order of magnitude speed up.

Uses of SIMPL.

Writing SIMPL has been an education for me, as I have learned about the inner workings of virtual stack based machines, and how they are created within a conventional register based processor.  A learning exercise for me, can equally be a educational process for others, so I see one of the uses of SIMPL is as a learning tool to teach students about the instruction set and the architecture of simple stack processors. In the same theme as “From NAND to Tetris”  “Building a SIMPL Stack Processor” could be offered in the form of an online study course, consisting of theory, implementation on the MSP430 Launchpad, or Juergen Pintaskes MicroBox. For some, learning electronic engineering and hardware description languages –  the implementation in a dedicated soft core processor on an FPGA woud aso be a useful earning exercise.

At it’s heart, SIMPL is a text interpreter, and can be used to process text files. Many of our modern manufacturing processes, such as pcb production, CNC machining, 3D printing and laser cutting transfer their information using fairly simple text file formats – such as Gerber and G code.  These files consist of numbers and alpha characters which are translated into sequential machine operations. SIMPL can be used as a text processor to convert these files between standards, or to generated new compatible formats – which may be of interest to the open source and maker communities.  Animating a walking robot, or fying a drone through a sequence of aerobatic manoeuvres are both things that could be translated into SIMPL text files.

In a later post I will describe how SIMPL source code can be used as a lingua franca for bi-directional  exchange of information and program code between radically different classes of computing hardware – from the humble Arduino or Launchpad to the fastest laptop or Octa-core Android platform. Laptops and other mobile computing devices are convenient viewing platforms because of their high performance graphics capabilities, whist small microcontrollers generally struggle to produce anything other than a serial text output.

However if this serial text was effectively a list of drawing commands for graphics primitives – the mobile platform, running a SIMPL virtual machine (written in iForth or Processing for convenience and portability) could interpret the “display list” and produce the fully rendered graphics display from it. SIMPL can send serial text to a Laptop or tablet at some 200,000 bytes per second – perfectly fast enough to render images for datalogging, oscilloscope or IDE type applications.

 

 

Posted in Uncategorized | Leave a comment