Looping in SIMPL – the mechanics of a tiny language

SIMPL
SIMPL is a character interpreter running within a loop – in pseudocode:
NEXT:             Fetch next Character (VM Instruction) from buffer
                         Decode to form unique Jump Address
                         Jump to associated code body, execute code – which ends with Jump back                            to NEXT
This forms the heart of a very simple virtual machine which is capable of executing unique code functions based on the character found in the memory buffer.  This elementary execution model is equivalent to most modern processor systems, so with a little bit of work, SIMPL can form the basis of a simulator for most basic processor architectures.
The character may be fetched from a serial terminal input buffer and executed immediately in an interactive manner, or the Instruction Pointer can point to a general area of program memory, and execute the character instructions found there.
It is this combination of using SIMPL either interactively at the terminal or as a means of entering and editing text into memory and subsequently running it from RAM, that makes SIMPL a powerful tool – to have in the programmer’s tool kit.
As a kernel, capable of executing the basic inner kernel loop,  SIMPL may be coded in about 300 instructions on most small processors such as the MSP430 – this includes the overhead of setting up the mcu oscillator, UART and GPIO – so that it supports serial terminal interaction.  As more functions are added, the code size grows by approximately 8 to 10 bytes per function – so it’s possible to have a small but usable SIMPL running in about 1K bytes.
The users application code will further extend this by perhaps a similar amount – but it still has an extremely small footprint, making it an ideal serial command shell for resource limited processors.
SIMPL when coded in C has been ported to a number of microcontrollers including ATmega, ATtiny, ARM M4, and MSP430.
The kernel has also been rewritten in MSP430 assembly language for speed and compactness, and work is ongoing to port it to the J1 Forth mcu.
The ability to encode the basic inner interpreter virtual machine in the native assembly language of a range of processors makes this a powerful technique of getting the basis of a useful “monitor” programme onto a variety of processors.
It also means that once the kernel has been coded in the native assembly language, then the user is working with a common set of instructions running on the virtual machine, at a level somewhat higher than the native assembly.
This creates a common language instruction set – a kind of “lingua franca” that makes interaction with a wide range of processors much easier.  It also opens up the door for different processors to communicate with each other through this common character encoded language.
SIMPL has some of it’s roots in Forth, and indeed Charles Moore’s Forth language has been a source of inspiration for the development of SIMPL.  However SIMPL is not a fully fledged language, more like a multi-function tool (Leatherman) which just makes the interpreted interaction with a range of small processors a lot easier, and less painful than the usual edit-compile-run cycle of code development found within a modern C compiler IDE.
Mechanics of SIMPL

Here’s some more info about the various routines that make up the kernel

textRead       33 Instructions 90 bytes
Receive characters from the serial uart and place them into a buffer in RAM.  Two modes of operation are possible, immediate mode, where the characters are executed as instructions directly after a carriage return line feed is received, and compile mode, where the character sequences are preceded by a colon and stored in pre-calculated command buffers in RAM.
number      16 instructions 42 bytes
Number interprets sequences of consecutive digits as a 16-bit integer number and places the value in the register that is the top entry of the stack.
next               5 instructions 12 bytes
Next is the routine that all commands return the program flow back to once they have executed. It fetches the next character instruction from the RAM buffer and through a jump table technique passes program control to the code body that performs the task associated with that instruction.  The  jump table  is used to direct the character to the areas of code that will treat them correctly – for example on encountering a numerical digit, program control is passed to the number routine, whilst for upper case alphabetical characters, program control is passed to a routine that handles these separately.
jump_table       96 instructions   192 bytes
Primitives         72 instructions 166 bytes
SIMPL uses a collection of 32 instruction primitives from which other instructions can be synthesised.   The code body of these primitives is extendable to some 110 instructions or so.
Upper (called alpha in V1)    10 instructions   26 bytes
Upper  handles the capital letters – as these are user commands, and the user is able to write and store in RAM certain functionality based on these characters.  When a capital letter is encountered in the instruction buffer, Upper directs control to the correct command buffer.
Lower   86 bytes
Lower is an area of program that interprets the lower case characters and provides a higher level of program complexity than is achievable form the primitives alone.
printnum     30 instructions   106 bytes
This takes a 16 bit integer number from the stack and prints a string of ascii digit characters to the terminal.
Uart Routines         13 instructions 41 bytes
Low level communication with the uart is by way of the get_c and put_c routines
Initialisation         19 instructions    90 bytes
Here the hardware such as the oscillator, GPIO and uart are initialised for correct operation.  This is code specific to whichever microcontroller has been chosen
Interpreter            4 instructions       16 bytes
This is the main routine that runs the SIMPL interpreter combining textRead, next, number and Upper.

 

Looping 

The loop structure is fundamental to the SIMPL language – but it is one of the trickier parts to code – but with much head-scratching, and a couple of false starts, I got it implemented today.

Loop Code.
Cracking the loop code – previously written in C, was key to getting SIMPL running in MSP430 assembly language. Very quickly after the  loop code start working –  comes LED flashing, tone generation, timing delays and printing character strings.
As it stands, with the loop-code in place,  SIMPL is 874 bytes long. I am confidant that we can get SIMPL into 1024 bytes – as per the Hackaday 1K challenge.
The SIMPL loop code is given a single value k – the loop counter which is placed on the stack immediately before the loop-start character which is a left bracket (.
On finding the left bracket, we need to make a note of the current instruction pointer when we enter the loop – so we have a location to jump back to each time around the loop.
We use the temporary registers R7 and R8 to hold the loop parameters  – i.e. the start of loop and the loop count
The control is then passed back to the interpreter by $NEXT  (jmp next) which steps through the instructions (in the loop) until the right bracket is found.
We then decrement and test the loop counter to see if it is zero – at which point the loop is terminated. If non-zero,  we return the instruction pointer to the start of the loop before jumping back to next.
Confused ?- you will be – but it really is that simple! Just 7 instructions and we are looping.
left_par:                ; code enters here on getting a left parenthesis

MOV.W tos,R8             ; save tos to R8 (R8 is the loop counter k)
MOV.W ip,R7              ; loop-start = ip the current instruction pointer at start of loop
JMP next                 ; get the next character and execute it

right_par:               ; code enters here on getting a right parenthesis

DEC.W R8                 ; decrement loop counter R8
JEQ #next                ; and check for zero - terminate loop
MOV.W R7,ip              ; set instruction pointer back to the start of the loop
JMP next                 ; keep looping

 

SIMPL now runs on a 16MHz MSP430 in just 872 bytes – with serial communications at 115200 baud. Most of the primitive instructions execute in about 1uS  – allowing a port pin to be toggled at 1MHz.

I have included the latest assembly source file at this Github Gist

 

Posted in Uncategorized | Leave a comment

More About SIMPL

SIMPL – a small Forth Like Language

Charles (Chuck) Moore, the inventor of the Forth computer language is one of my all time computing heroes.

In a life long career in computing, starting in the early 1950’s – he has been mathematician, programmer, hardware engineer and designer of several unique processors that execute the Forth language directly as their native instruction set.

Last year, I was fortunate enough to have the opportunity to meet him at Forth Day – at Stanford University, in November 2016 – and he has been an inspiration for some of my recent work.

I first came across FORTH as a teenager in the early 1980’s when the home computer was dominated by the BASIC programming language, and it fascinated me that Forth offered a radically different alternative.

It’s taken me the last 35 years to get around to understand what makes Forth tick – as part of my mid-life computing crisis, and as such I have taken time out to learn some of the fundamentals of this fascinating language.

Chuck Moore always looks for ways to simplify the task in hand – and it is his search for simplicity that has led me to some recent developments with a tiny programming language – that I call SIMPL.

SIMPL stands for Serial Interpreted Minimal Programming Language – and it was initialy inspired by Ward Cunningham’s Txtzyme – for Arduino or Teensy.  I took Txtzyme, added several new commands, and made it an extensible language, which could be ported to virtually any microcontroller.

SIMPL was originally written in C for Arduino, and then MSP430 and ARM. Now I am exploring writing the core routines in MSP430 assembly language for speed, efficiency and compactness.

The extended version of SIMPL – called SIMPLEX, includes a tool-kit to allow editing, loading, compiling and examination of memory – all  hosted by the target processor – in about 5K bytes.

MISC Processors and eForth

 

Inspired by Chuck Moore’s Minimal Instruction Set Computer (MISC) designs of processors from the mid 1980’s, I have chosen a similar approach. At the same time that Chuck Moore was working on designing MISC processors in gate arrays,  Bill Meunch  and Dr. Chen Hanson Ting were developing eForth – a complete forth that could be synthesised from only around 30 primitive instructions.

Meunch and Ting showed that once these 30 primitives had been coded in the native assembly language of the target processor, plus a porting of the inner and outer interpreters, then the whole eForth language could be created very quickly.

Ting once remarked that to put eForth on any new microprocessor, starting from scratch generally took him about 2 weeks.

A Comfortable Computing Environment

eForth creates a comfortable computing environment within any new, unfamiliar processor,  – kind of like establishing a base-camp in a hostile terrain. Additionally, once you have studied the processor architecture and instruction set sufficient to write the eForth primitives in it’s assembly language – you have probably become quite intimately conversant with it’s inner workings.

Forth is an extensible interactive language, which provides a complete computing environment hosted by the target machine. It dispenses with the need for sophisticated compiler tool chains or  IDEs, and allows all coding and interaction to be done from a simple serial terminal.  At the same time it provides an editor, loader, compiler, and the means to examine memory by way of a hex dump.

Registers and memory mapped I/O can be poked or examined interactively, and the low level drivers for I/O can be developed by experimentation from the ground up.  Initially the only I/O requirements are to allow a character to be sent or received via the serial UART – and virtually every modern microcontroller has at least one UART.

eForth however is some 6K to 8K bytes in a typical implementation, so I asked myself whether there was a way of creating an extensible language from an even smaller starting point – and so the idea of SIMPL came into being.

The Mechanics of Interpreted Languages

eForth, SIMPL and other tiny Forth like languages rely on efficiently coding a small set of primitive instructions onto a stack-based, 16 bit virtual machine, running in the native assembly language of the host processor.

The virtual machine consists of an inner interpreter that fetches the next instruction token from memory, performs a decode operation (either by jump table or switch-case structure) and executes the code  associated with that operation. The efficient coding and execution of the primitives and the inner interpreter is what gives these threaded, interpreted language their speed.

Ideally there will be fewer than 32 primitives, so in a hardware cpu implementation they may be represented by a 5 bit token, and mapped directly into the cpu machine instructions – this is the approach that Chuck Moore used with his Forth processors, and has been widely been adopted since – such as James Bowman’s J1 Forth CPU.

However in a software implementation we need to encode these in the native assembly language of the target processor, be it an 8-bit, 16-bit or 32 bit processor. Forth was developed at a time when 16-bit minicomputers were just appearing – and so traditionally it has used a 16 bit integer word size.  The late 1970’s and early 80’s saw Forth being adapted to 8-bit microprocessors – some of which were a good fit to the Forth virtual machine model – and some were not.

An example of a tiny Forth for 8-bit AVR

In one very compact implementation – T. Nakagawa’s “Tiny Forth” written for the 8-bit AVR, 25 primitives were encoded into just 286 instructions – an  average of about 11 machine language instructions per primitive.

Despite the limitations of an 8-bit architecture – Nakagawa’s AVR assembly language implementation is very efficient in about 840 instructions of AVR code and 137 bytes of tables – this assembles to around 1.8K bytes.

Typically the cpu does the following pseudo-code by fetching the next primitive instruction from memory,  testing it’s token value and branching to the next in the table if it’s not a match.

In hardware this is equivalent to the instruction fetch and decode.  Getting this process to run efficiently in assembly language is key to the speed of the resulting language.

Test the token character of the primitive
Is it the one we are looking for - no, branch to next in the table
Get the operands off the stack and into a pair of working registers
Perform the operation
Put the result back onto the stack
Correct the stack pointer
Return to the inner interpreter

Here is the AVR implementation of ADD

prim6: ; +              ; This is the ADD primitive with token=5
 cpi r16, 5             ; Is the token=5?
 brne prim7             ; No - branch to next primitive
 ld r19, X+             ; get operand 1 off the stack
 ld r18, X+
 ld r17, X+             ; get operand 2 off the stack
 ld r16, X+
 add r16, r18           ; add the low bytes
 adc r17, r19           ; add the high bytes plus any carry
 st -X, r16             ; Put the result back on the stack
 st -X, r17
 ret                    ; Return to the inner interpreter

In an 8 bit processor such as the AVR,  it takes 4 instructions to load up the working registers with two 16 bit operands (Forth uses a 16-bit integer size), and two instructions to perform arithmetical or logical operations on them. Passing the result back to the stack is another two instructions followed by a return.  With the overhead of the instruction decode and return, the basic arithmetical and logical operations are typically 11 instructions.   So in this light, an 8-bit Forth generally runs at about one tenth or less of the instruction rate of the processor.   The decode operation takes 2 instructions – even if there is no-match, so the frequently used primitives should be put closer to the start of the list.

Nakagawa’s Tiny Forth is concise and well documented – and I have reproduced it in this Github Gist

Whilst the coding of the primitives takes up around a third of the total program, the thirty short routines to implement the inner and outer interpreter, and the support routines such as serial input and output, multiplication and division take up the remaining two thirds.

With SIMPL, I strive to reduce all of these overheads, and this means an implementation using a 16-bit microcontroller – which is fundamentally a far better match to the 16-bit Forth Virtual Machine.

The MSP430 – A good match to SIMPL.

The MSP430 has been around for many years, originally developed by Texas Instruments in 1993 as a low power, mixed signal, RISC processor with a von Neuman architecture. It has an un-complicated 16 bit register set, and a very wide range of peripheral modules, including 24 bit ADCs, Low Energy Accelerators for FFT and DSP math operations and a large selection of standard peripherals.

In recent years it is one of few processors to have on-chip non-volatile FRAM – which has a much faster writing speed than Flash, and with 10E14 write cycles – virtually indestructible.

For a 16-bit processor, such as the MSP430, 16-bit integer maths and register operations are single machine instructions.

The ADD primitive becomes:

add: add @stack +, tos    ; Add 2nd stack member to the top of stack
     $NEXT                ; Call macro that executes the NEXT instruction

Clearly this increases the efficiency of executing the primitives and reduces the code space requirements, provided that the overheads of decoding an instruction token, and then fetching the next virtual machine instruction are kept to a minimum.

 

 

Simplifying Forth

Forth uses the concept of a “Dictionary” to organise the Words or subroutines.  The Words are written in natural language, and the dictionary search function normally uses the first three characters of the word and the length in order to find that word within the dictionary.  This is a powerful and flexible approach, but adds quite a significant coding overheads to the outer interpreter.

The approach I use in SIMPL was inspired by Ward Cunningham’s Txtzyme Interpreter – where just matching a single character using  switch-case statements causes the code associated with that character to be executed.  Written in assembly language this can be done with a hash-table,  or just by jumping to a program address that has been calculated in some way from the ascii character value.

Critics might say that this approach limits you to only a few dozen unique commands – and yes it does, but if your application needs more than this – perhaps it’s worth rethinking how you are going to solve your application.

As Einsten said “Make things as simple as possible, but not simpler”

There are 85 printable ascii codes between  32 and 127,  plus 10 numeric digits.  If we group these into

26 upper case characters

26 lower case characters

33 symbols and punctuation characters

SIMPL uses most of the symbols and punctuation characters as primitive instructions – such as &  +  !  % etc.   The lower case characters are used for invoking higher level commands – synthesised from the primitives. These were influenced by Txtzyme, tend to be more microcontroller specific – and may include commands to exercise the I/O and peripherals, read or write to a digital port, get a reading from and ADC or provide  hex dump of memory.  Finally are the uppercase characters which are reserved for the Users application commands – and provide the mechanism by which the language is extended.

 

The Primitive Instructions
Most of the primitive Forth words are located in the ASCII characters 32 to 63 that leaves 64 to 126 available for the users vocabulary and constructs made from primitives.

There are approximately 32 printable ascii punctuation and symbol characters – and with a bit of pre-processing they can be used to form the basic machine primitives.

Stack Commands (7) PUSH, POP, DUP, DROP, SWAP, OVER, NOP

, PUSH
. POP
" DUP
' DROP
$ SWAP
% OVER
SP NOP

Maths & Logical Operations (9) ADDC, SUB, MUL, DIV, AND, OR, XOR, NOT, SHR 

+ ADDC
- SUB
* MUL
/ DIV
& AND
^ XOR
` SHR
| OR
~ INV (NOT/COMPL)

MUL and DIV my be substituted for Shift Left and Shift Right 

Transfer Instructions (8)

: CALL (COMPILE)
; RET
( LOOP-START
) LOOP-END
< LESS
= EQU
> MORE
\ JUMP - condition follows

IN/OUT (2)

? KEY (INPUT)
. PRINT (OUTPUT)

Memory Instructions (3)

@ LOAD
! STORE
# LIT

Others (5)

_ _ String Print
[ ] String Store
{ } Array of elements - switch/case

 

The use of single ascii characters as tokens, makes the language a lot less verbose than Forth, and snippets of source code can be sent in very few characters – for example as an SMS message between 2 systems equipped with GSM modems.

It is a fairly simple task to have a word table – where the verbose form of the words are stored, and can be printed out, to expand the source into something more readable – for example

“‘% -> DUP DROP OVER

Three ascii characters expanded to 11 plus 3 spaces

SIMPL does not use the space as word separator, but in the special case where you have several numbers to put on the stack, the space is used to indicate “push onto stack”

The SIMPL Virtual Machine

SIMPL is based around a virtual stack machine that provides an interactive programming environment.  SIMPL primitive instructions are effectively the native assembly language of that virtual machine, and have been chosen to provide a compact but versatile instruction set.

Ultimately, I wish to create an open-core processor using FPGA devices that can execute SIMPL code directly – but in the meantime I need to rely on a virtual machine written in MSP430 assembler to do this for me.

SIMPL is Forth-like in that it uses words or tokens that cause blocks of code to be executed sequentially. It differs from Forth in that the words consist of single ascii characters that are decoded directly causing the cpu to call a routine at a given address, execute the subroutine found there and then return control to the inner interpreter.
The inner interpreter is very compact on the MSP430 with the main interpreter loop fitting into about 300 bytes. On top of this interpreter loop, you have the code for the low level primitive symbols – listed below, and then higher level words – such as “toggle a port pin” “n milliseconds delay” “read an adc channel” “output a string to serial port” “produce a hex dump of memory” which are represented by the lower case alphabet characters.

This layer of the language is what I refer to as the “Arduino Layer” – all those useful helper functions that are processor specific and deal with timing and I/O peripherals.

The next layer is the users application code where the users words use captital letters. For example – the classic washing machine program example:

FILL
WASH
EMPTY
FILL
RINSE
EMPTY
SPIN

This would be condensed in SIMPL as FWEFRES
– which is a code reduction factor of >5

But each of thes can have a parameter – for example if the WASH is an AGITATE cycle A

– In Forth

: WASH 50 DO AGITATE LOOP;

In SIMPL we also use a colon definition to define W. In fact the ascii code for W is used to store this code snippet at a given address – calcuated as (W-61)*32. So as soon as the interpreter encounters W, it jumps to that code address

:W50(A) the code in the parenthesis is repeated 50 times.

 

In Conclusion

SIMPL is a compact, extensible, interactive language that provides a minimum tool-kit to allow small applications to be developed on microcontrollers.

It provides a development environment, which may be ported easily to virtually any microcontroller with very low overheads.

The use of single ascii characters as primitive instructions and commands makes it compact but retains a certain level of human readability – that some machine languages do not.  As there is a one to one substitution between the ascii character and the Word – a translation table may easily be created to expand each symbol out to its full text representation.

SIMPL can act as a Lingua Franca between widely different classes of computing machines, or as messaging between several specialised processors on the same board . A lot of meaning can be conveyed in a few characters, and so a simple microcontroller could communicate tasks between dedicated co-processors in just a few characters.

At it’s heart, SIMPL has a text interpreter that reads character strings from memory and translates these into machine actions.  This is a very powerful technique applicable to all sorts of everyday tasks such as desktop manufacturing, CNC milling/routing, 3D printing, laser cutting, graphics drawing etc.

 

 

 

Posted in Uncategorized | Leave a comment

Revisiting NAND to Tetris

HACK_BitsliceThe simple load-store architecture of the “Hack” computer featured in the NAND to Tetris course is worth of revisiting.

Some years ago I created a bitslice design using 2 input NAND gates to handle just one bit of the design.  16 such slices would be needed in order to create the whole computer.

Bitslicing was widely used from the mid-1960’s to the early 1980’s and several IC manufacturers made slices in various bit-widths. The most common were 4-bit – from Texas Instruments and AMD.

 The bitslice takes advantage of the fact that the ALU operations generally operate only on bits taken from the same position in the word – and apart from a Carry-In and a Carry-Out the slices can be virtually independent.
As well as the ALU, it is also common to create the various storage registers a bit at a time – which was done in the PDP-8 in the mid-1960s.  Additionally the program counter can be created as a structure on the slice.
In the work I did when I first visited this I found that the breakdown between the various slice features was as follows:
ALU
Input Negator (x2)         8 gates
Input force to zero (x2)   6 gates
Full Adder (2 x XOR)       9 gates
AND function               1 gate
Output Negator XOR         4 gates
ADD/AND Function Mux       4 gates
Zero Detect                3 gates
Spare gate                 1 gate   

Program Counter            16 gates  
D register                 9 gates
A register                 9 gates
ALU Mux (2:1)              4 gates 
A reg Mux (2:1)            4 gates
Spare                      2 gates

Total                     80 gates  - or 20 quad 2-input NAND packages
It was interesting to note that the 4 gate XOR function appeared repeatedly – 6 times, and the 4 gate, 2 input multiplexer was also used 6 times – between these common building blocks accounting for 60% of the overall logic design.  In a future revision using 74LV86 quad 2 input XOR and 2:1 multiplexers would make for a more efficient design – possibly if the slice was increased to 4 bits wide.   The registers are also package costly – needing one package for the flip-flop, one for the load multiplexer and a further single inverter to produce complementary data  inputs for the flip flop, there could be some savings made here also.
However the thrust of this project was to investigate the feasibility of just using 2 input NANDs and to become familiar with the structures in the cpu core.

With a total of just 20 quad NAND packages on each slice, costing about $0.10 each, to realise the 16-bit cpu in hardware would require approximately 320 packages – about $32.00.  This was starting to look like a viable proposition, and not such a daft idea as my original aim of implementing it in DTL (diode transistor logic).

Using small outline surface mount 74LV00 packages – the 20 devices could easily fit on a 2 layer board about 2″ x 2″ – and plug into a backplane using low cost double row  0.1″ connectors.

Each board would use jumper links to ensure that it had the correct carry and zero-detect signals taken from the backplane, and also ensure that it received the correct input bit and output bit – selecting from the 16 available.

 For convenience of programming, an Arduino Due (or MEGA) would be used to simulate the ROM and RAM memory functions, and be able to provide a programming and user interface via a serial terminal.
The status of the registers would be displayed using various colours of LEDS – allowing the contents of the A (address) register, the D (Data) register and the Program Counter to be monitored visually.
It would also be intuitive to bring out the memory register and the Instruction register – to aid debugging. These two additional registers would cost an additional 18 gates per slice – roughly another 5 devices. There was some thought of adding toggle switches for manual input of data – and this could be revisited at a later date.
I decided that it would be neat to make the pcbs the same overall dimensions as the  Arduino – so that they could all fit into a 3D printed card frame with common backplane. The overall dimensions of this frame would be around 10″ wide, 5″ deep and 2.5″ high. This would make it a manageable desktop product – and not too tiny. Cards would be spaced approximately 0.5″ apart.
EagleCAD was used for the original CAD, and recently parts of the design have been imported into LogiSim  to allow the logic to be carefully checked. There are a maximum of 18 gate delays in the ALU – so conservatively taking the propagation delay per gate to be 10nS this might suggest 180nS total delay in the ALU or about 5MHz operation. However, as an Arduino will be used as the “memory ” emulator, it is unlikely that this speed will be achieved.

 

 

Posted in Uncategorized | Leave a comment

Emulating Simple CPUs Using Arduino

In my exploration of minimal instruction set (MISC) cpus, it is often a worthwhile exercise to define the instruction set and then simulate the behaviour of the cpu in software.

This can be done relatively straightforwardly using even a humble Arduino to mimic the operation of the cpu. A couple of evenings programming and you have a simple simulator – that can readily be changed to suit new architectures or new instruction sets. Whist the original Arduino is resource limited – the same code will run on the MEGA or DUE bards – offering a whole lot of extra RAM – and in the case of the DUE – speed.

The Arduino Uno or Duemillenove has 2Kbytes of RAM – and this is just about enough to write short programs – particularly if the instructions are multiple bytes long.

To create a CPU simulator, we first have to create an array in memory that will hold the program in the form of the machine instructions for the proposed processor. If the cpu needs a 16-bit instruction width,  then for practical purpose this array should only be about 512 words long, which will immediately consume 50%  of the available RAM.

We also have to define the principal registers – including Program Counter (PC), Instruction Register (I),  Accumulator,  and possibly the address of any memory location that is updated by the instruction.

Now we come to the instruction set and architecture (ISA) – and the instruction decoder.  First we need to choose how many instructions, and what operands they will act upon.

For a MISC machine we should really be looking at about 32 or fewer instructions – mostly based on what the chosen ALU can do.  We will also require program branching, loops and other conditional program flow control instructions.

In the case of th J1 Forth CPU – there are just 5 classes of instructions

Literal

Call

Conditional Branch

Unconditional Branch

ALU Operation

A true load-store architecture will be able to operate on any location in the available program RAM.  A register based architecture will have a group of registers that are operated on preferentially – either implemented in block RAM on an FPGA or possibly in off chip RAM – occupying the low addresses of the zero page – which minimises the addressing overhead.  Finally  – with a stack based architecture, there will be the top of stack, the next of stack, and some elements in the return stack that have preferential access.

In the case of EDSAC, which is a traditional load-store architecture – all of it’s 1024 words of RAM storage are treated identically by the ALU. However, there is no reason why for improving the programming model – that the firs few words in memory could not be given register names – and used preferentially for moving data about.

Instruction Decoder

The Instruction Decoder can just be based on a switch-case statement. For a machine with just 32 instructions – this can be as little as a few lines of C-code, that calculate the new state of the Accumulator, Program Counter and Memory – based on whatever instruction has been executed.

Posted in Uncategorized | 2 Comments

Using SIMPL as a tool to bootstrap unusual processors

I have written a lot about SIMPL over the years, but it is my conviction that it has uses as a tool to help bootstrap various novel processors – and to ease the early stages of processor code development.  It’s not a fully fledged interpreted language like Forth or BASIC but just enough to make writing code easier on unfamiliar processors.

SIMPL may be written in about 100 lines of C code, or ported directly to the native assembly language of the  target processor for even more compactness and speed, which is the approach I have used with the MSP430.

With this in mind, I will attempt to use SIMPL to provide an interactive coding environment for the 1949 EDSAC,  – or at least a simulated or an FPGA softcore version of it – as part of the Wuthering Bytes EDSAC Challenge  – to be held i Hebden Bridge in early September.

This is a work in progress – so don’t expect too much just yet.

 
SIMPL
SIMPL is a character interpreter running within a loop and as such, models a virtual machine:
NEXT:              Fetch next Character from buffer
                         Increment Program Counter
                         Decode to form unique Jump Address
                         Jump to code body, execute code – end with Jump back to NEXT
This forms the heart of a very simple virtual machine which is capable of executing unique code functions based on the character found in the memory buffer.  This elementary execution model is equivalent to most modern processor systems, so with a little bit of work, SIMPL can form the basis of a simulator for most basic processor architectures.
The character may be fetched from a serial terminal input buffer and executed immediately in an interactive manner, or the Instruction Pointer can point to a general area of program memory, and execute the character instructions found there.
It is this combination of using SIMPL either interactively at the terminal or as a means of entering and editing text into memory and subsequently running it from RAM, that makes SIMPL a powerful tool – to have in the programmer’s tool kit.
As a kernel, capable of executing the basic inner kernel loop,  SIMPL may be coded in about 300 bytes on most small processors such as the MSP430 – this includes the overhead of setting up the mcu, oscillator, UART and GPIO etc – so that it supports serial terminal interaction.
As more functions are added, the code size grows by approximately 16 bytes per function – so it’s possible to have a small but usable SIMPL running in about 1K bytes.
The users application code will further extend this by perhaps a similar amount – but it still has an extremely small footprint, making it an ideal serial command shell for resource limited processors.
SIMPL when coded in C has been ported to a number of microcontrollers including ATmega, ATtiny, ARM M4, and MSP430.  The kernel has also been rewritten in MSP430 assembly language for speed and compactness, and work is ongoing to port it to the J1 Forth mcu.
The ability to encode the basic inner interpreter virtual machine in the native assembly language of a range of processors makes this a powerful technique of getting the basis of a useful “monitor” programme onto a variety of processors.  It also means that once the kernel has been coded in the native assembly language, then the user is working with a common set of instructions running on the virtual machine, at a level somewhat higher than the native assembly.  This creates a common language instruction set – a kind of “lingua franca” that makes interaction with a wide range of processors much easier.  It also opens up the door for different processors to communicate with each other through this common character encoded language.
SIMPL has some of it’s roots in Forth, and indeed Charles Moore’s Forth language has been a source of inspiration for the development of SIMPL.  However SIMPL is not a fully fledged language, more like a multi-function tool (Leatherman) which just makes the interpreted interaction with a range of small processors a lot easier, and less painful than the usual edit-compile-run cycle of code development found within a modern C compiler IDE.
 
Assembler and Instruction Set Decoder.
At it’s heart, SIMPL, when the kernel is coded in C, uses techniques such as switch-case statements and look-up table, or hash table to create the code address associated with the input character.
This technique allows not just a jump address to be associated with the character, but also other data, such as instruction look-up – for extending an 8 bit character into a longer instruction length – with individually coded bit-fields. This is the method used in generating J1 assembly language instructions from the common SIMPL command set.  This is a form of microcoding and is a powerful technique used when writing assembly language for different target hosts.  The (expanded instructions decoded in SIMPL can be written to RAM, creating the object code for the new target.
Processor Simulation
SIMPL also has uses in the automation of cpu simpulation.  Any small cpu (with small instruction set) may be simulated in a few dozen lines of C code, and a processor model including memory, registers and ALU created.  SIMPL commands can be used to create assembly instructions to test this model,  single step through executable code runs and examine the results on a hex dump display that shows the memory contents and principal register contents.  A full screen of serial data can be updated in less than 0.1 seconds  – making single stepping more productive.
It is this processor simulation that I chose to explore next, and my target cpu will be one of the earliest processors – the 1949 EDSAC – built at Cambridge university.
Fortunately the EDSAC had very few instructions and is well documented.  On a less fortunate note, the machine uses an instruction set input via a 5 bit paper tape, which is somewhat different in order to the more familiar ascii character set, and uses “figure shift” and “letter shift” to access the various characters of the teleprinter keyboard. This means that serially typed ascii characters will have to go through a further layer of decode to turn them into the Murray coded characters as used on the Creed teleprinters of that time, and a further translation back again into ascii for output to a serial terminal.  However these  character translations can be done with simple look-up or switch-case statements.
Posted in Uncategorized | 1 Comment

The 1949 EDSAC Instruction Set and Architecture

The 1949 EDSAC
This information has been taken from the EDSAC explanatory poster produced by Cambridge university.  I have added some of my own notes for clarity.
The 1949 Instruction set EDSAC’s instructions in 1949 was very simple and were executed at a rate of about 600 per second. They were as follows:
AnS: A += m[n] AnL: AB += w[n]
SnS: A −= m[n] SnL: AB −= w[n]
HnS: R += m[n] HnL: RS += w[n]
VnS: AB += m[n] * R VnL: ABC += w[n] * RS
NnS: AB −= m[n] * R NnL: ABC −= w[n] * RS
TnS: m[n] = A; ABC = 0 TnL: w[n] = AB; ABC = 0
UnS: m[n] = A UnL: w[n] = AB CnS: AB += m[n] & R
CnL: ABC += w[n] & RS
RnS, RnL: Shift ABC right arithmetically by the number of places corresponding to the position of the least significant one in the shift instruction. For example, R0L, R1S, R16S and R0S shift by 1, 2, 6 and 15 places, respectively.
LnS, LnL: Shift ABC left arithmetically by the number of places corresponding to the position of the least significant one in the shift instruction. For example, L0L, L1S, L16S, L64S and L0S shift by 1, 2, 6, 8 and 13 places, respectively.
EnS: if A >= 0 goto n
GnS: if A < 0 goto n
InS: Place the next paper tape character in the least significant 5 bits of m[n].
OnS: Output the character in the most significant 5 bits of m[n].
FnS: Verify the last character output.
XnS: No operation.
YnS: Add a one to bit position 35 of ABC, counting the sign bit as bit zero. This effectively rounds ABC up to 34 fractional bits.
ZnS: Stop the machine and ring a bell.
The EDSAC Instruction Set (1949)
What caught my attention is that the instructions use a single character mnemonic – taken directly from the upper case characters available at that time from the 5-bit “Murray” teleprinter code – which was in common use in the Creed teleprinters used at that time.
The mnemonics were chosen to give a reasonable amount of readability such as A for ADD and S for SUBtract.
This makes common sense to make it as easy as possible and is a simple first stage mapping of human readable mnemonic characters to a 5 bit binary teleprinter code, which was used directly as the 5 bit instruction.
There were only two register spaces accessible from the instruction – visible to the user:
The 71-bit Accumulator
The 35-bit Multiplier Register
Further more there were 1024 18-bit words of delay line storage
The contents of the Order Register (Instruction Register) and Sequence Control Tank (Program Counter) as well as the multiplicand register were viewable on the operator’s console.
For my own clarification – so as to build up a simple model of the EDSAC, I have attempted to group the instructions into:
Accumulator and memory only
A n Add the number in storage location n into the accumulator
S n Subtract the number in storage location n from the accumulator
T n Transfer the contents of the accumulator to storage location n and clear the accumulator
U n Transfer the contents of the accumulator to storage location n and do not clear the accumulator
Accumulator, Multiplier and Memory
H n Copy the number in storage location n into the multiplier register
V n Multiply the number in storage location n by the number in the multiplier register and add the product into the accumulator
N n Multiply the number in storage location n by the number in the multiplier register and subtract the product from the accumulator
C n Collate [logical and] the number in storage location n with the number in the multiplier register and add the result into the accumulator
Accumulator and Shifter Operations
R 2n-2 Shift the number in the accumulator n places to the right
L 2n-2 Shift the number in the accumulator n places to the left
Conditional Jumps
E n If the sign of the accumulator is positive, jump to location n; otherwise proceed serially
G n If the sign of the accumulator is negative, jump to location n; otherwise proceed serially
Input-Output
I n Read the next character from paper tape, and store it as the least significant 5 bits of location n
O n Print the character represented by the most significant 5 bits of storage location n
F n Read the last character output for verification
Misc Operations
X No operation
Y Round the number in the accumulator to 34 bits
Z Stop the machine and ring the warning bell
The numerical values in the accumulator and multiplier registers are normally thought of as signed binary fractions, but integer operations could also be done easily. For example, the order V1S can be interpreted as adding the product of the 17-bit signed integer in m[1] and to the 17-bit integer in RS and adding the result into bits 0 to 32 of the ABC. With a suitable shift, the integer result can be placed in the senior 17 bits of A ready for storing in memory
Teleprinter and perforated tape for Input Output
EDSAC used 5-bit integers (0 to 31) to represent characters using two shifts: letters and figures.
In letter shift the codes 0 to 31 respectively represented: P, Q, W, E, R, T, Y, U, I, O, J, figs, S, Z, K, lets, null, F, cr, D, sp, H, N, M, lf, L, X, G, A, B, C and V.
In figure shift the encoding was as follows: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ?, figs, “, +, (, lets, null, $, cr, ;, sp, £, ,, ., lf, ), /, #, −, ?, : and =.
In these tables, figs, cr, sp and lf denote figure shift, carriage return, space and line feed, and on the paper tape perforator their keys were labelled π, θ, φ and ∆, respectively.
In this document, these codes correspond to the ASCII characters #, @, ! and &.
The paper tape reader complemented the high order bit of each 5-bit character, so the rows , and are read as codes 0(P), 7(U) and 27(G), respectively. The machine could read paper tape at a rate of 50 characters per second and output to a Creed teleprinter at nearly 7 characters per second.
The 5-bit Murray code used by the paper tape perforator and teleprinter was somewhat different to the familiar ascii character set.  I have tried to sumarise it here
Table 2 Edsac Character and Instruction Codes

Letter Shift  Figure Shift  Binary  Decimal   Instruction
P             0             00000      0      PLACE
Q             1             00001      1
W             2             00010      2
E             3             00011      3      JUMP IF EQUAL
R             4             00100      4      RIGHT SHIFT
T             5             00101      5      TRANSFER & CLEAR
Y             6             00110      6      ROUND
U             7             00111      7      UPDATE & !CLEAR
I             8             01000      8      INPUT
O             9             01001      9      OUTPUT
J                           01010      10
#             FS            01011      11
S             "             01100      12     SUBTRACT
Z             +             01101      13     STOP
K             (             01110      14
DEL           LS            01111      15
NULL          .             10000      16
F             $             10001      17     VERIFY
@             CR            10010      18
D             ;             10011      19
!             SP            10100      20
H             £             10101      21     COPY TO MULT
N             ,             10110      22     NEGATE THROUGH MULT
M             .             10111      23
&             LF            11000      24
L             )             11001      25
X             /             11010      26     NO OP
G             #             11011      27     JUMP IF GREATER
A             -             11100      28     ADD
B             ?             11101      29
C             :             11110      30     COLLATE (AND)
V             =             11111      31     ADD THROUGH MULT

Notes

1 Erase (DEL) is represented by an asterisk (“*”) in the simulator. When this character is output, it sets the
teleprinter into letter shift.
2 Blank tape is represented by a period (“.”). This character has no effect on output.
3 The personal computer text environment has only a “newline” character. On the Edsac simulator,
the line-feed character is interpreted as a newline character, and carriage returns are thrown away.
4 The symbols ?, f, ? or p are typed as @, !, & and #, respectively.

Some capital alphas unused B,D,J,K,M,Q,W
Posted in Uncategorized | Leave a comment

An EDSAC Simulator using SIMPL

In September 2017, in Hebden Bridge – as part of the Wuthering Bytes Digital Festival, will host the EDSAC Challenge, in which we will attempt to create the 1949 Cambridge EDSAC computer  as a softcore processor using an FPGA programmed using the IceStorm open source tool chain.
EDSAC was a simple architecture with just 18 instructions and used an 18-bit word length. It could address up to 1024 words of mercury delay line memory and had an instruction cycle time of about 1.5mS but represented a 1000 fold speed up in calculation times compared to contemporary hand cranked mechanical desk calculators.
EDSAC has not only historical significance, as it defined the start of the British Computer Industry, but it was also the first general purpose machine designed for real everyday scientific computation.
It’s simple architecture is relatively easy to study and there is a simulator program that runs on any current platform – thus it is a good introduction to Von Neuman “load-store” computer architectures.
My interest lies in the fact that it has so few instructions – and thus could be termed a minimal instruction set computer or MISC, which is a current fascination of mine.
My contribution to the ESAC Challenge, is that I wish to port my SIMPL language toolkit to the 1024 words of EDSAC memory, to see whether it can run on such a minimal architecture, and provide an interactive computing environment.
With just over 4 months available before the EDSAC Challenge – it’s time to get started.
The EDSAC Challenge is to be held at Hebden Bridge Town Hall on 7th and 8th September 2017 – immediately following the 2 day Chip Hack FPGA course.  These events are all parts of the Wuthering Bytes Digital Festival that runs from Friday 1st September to Sunday 10th September inclusive.
The next post looks at the EDSAC instruction set – and some of the challenges presented by working with 1940’s teleprinter code sets.
Posted in Uncategorized | Leave a comment