Exploring DIY CPUs

In the last couple of posts I described how I have used a TI MSP430 Launchpad to emulate the 1949 Cambridge University EDSAC machine.

This opens up the possibility of using a similar scheme to emulate virtually any DIY cpu, possibly with a novel or experimental instruction set.

The MSP430 Launchpad was used only because I had it at hand, but the cpu model and the SIMPL communications framework, being written in C, could be ported to almost any microcontroller – especially if they are supported by the ever-growing Arduino IDE, now with ARM Cortex M0, M1 and M4 extensions.

A Look at an instruction set for a MISC Processor.

As I am a proponent of MISC (minimum instruction set computers), the task of simulating the cpu is simplified to just 32 instructions – represented by a 5-bit token, and I have chosen standard ascii characters to represent my instruction primitive. EDSAC used 18 capital letters to represent its instructions, with a mnemonic flavour, but any printable character is appropriate to represent an instruction primitive.

These primitives represent various ALU instructions, data moves between the ALU and memory or registers, and various input-output instructions.

I have attempted to pick a mix of instructions that are easy to implement, useful and offer flexibility.  Whilst anyone can have aspirations about designing the ISA of a new processor, it is likely to have a minimum number of instructions to be useful.

This number was estimated to be around 25 in the 1990s – by Bill Meunch, Chen Hanson Ting Chuck Moore and others – working MISC and Forth cpu implementations. Previous cpus have successfully use 16 or even 8 instructions (see below).

From 25 primitive instructions,  more complex instructions can b e synthesised by assembling groups of these primitives into macros.

ALU Group                 NOP, ADD, SUB, AND, OR, XOR, INV,  SHL, SHR

Call/Jump Group       CALL, JMP, JEQ, JGT, JLT, RET

Memory Group          FETCH, STORE

Stack Operations      DUP, DROP, SWAP, OVER, PUSH, POP

I/O Group                    IN, OUT, EMIT, KEY     (emit is putchar and key is getchar)

Register Transfers   TO-R, FROM-R

Program Flow            FOR, NEXT

Thus we have 31 instructions that are tailored towards a stack based processor. For a more conventional register processor, or load-store architecture there would probably be more instructions for register transfers and richer addressing modes.

It could also be argued that all of the conditional jumps might not strictly be required, as a jump if zero can implement all program control structures.  Additionally if the processor has memory mapped peripherals then IN and OUT are practically redundant.

Deciding which instructions to keep and which to lose is all part of the mental anguish process of creating a the ISA of a DIY cpu.

Having decide on a set of instruction primitives, and the character tokens to represent them, it’s a case of looking at the overall processor architecture.

One cycle per instruction would be good – always a design aim for a simple processor, and also whether the use of dual port RAM can allow a partial pipeline – such that the ALU is calculating the ALU result(s) whilst the instruction is still being decoded.  This is the approach James Bowman has taken in his J1 Forth Processor.

Stack Processors generally use two stacks  – the data stack and the return stack – but there are other classes of stack machines – see Koopman’s Book on Stack Processors

These stacks can be implemented in main memory, or as a closely coupled register file. The stacks will have their own stack pointers and this is why the TO-R and FROM-R instructions are present to allow the TOP of stack to communicate with R, the Return Stack Pointer. There also needs to be a hardware mechanism to put the current Program Counter onto the return stack, before executing a CALL.

In SIMPL, there are really only two main registers x and y to hold the operands for the ALU.  There is also a loop counter k which is decremented each time around the loop until it zeros, and very occasionally there is need to access another variable.

It could be argued that 16 registers in a stack would be sufficient for most purposes, and at times four would be more than enough. As registers are cheap in FPGAs, then it may be worth considering a hybrid architecture, where the stack is a set of registers – any of which are addressable. This is a deviation from the pure TOP/NEXT architecture of a traditional stack processor – but it extends the processor functionality quite considerably, bringing it in line with many small commercial processors that have a rich set of registers.  And in the virtual machine to execute SIMPL, we need an Instruction Pointer, Program Counter, Return Stack Pointer, Data Stack Pointer, Loop Counter, Top and Next of stack – so 7 registers already.

Having more registers gives greater flexibility, but at the cost of needing more instructions to address them.  One compromise might be the idea of “zero page RAM” – where a limited area of memory has its address coded into the lower part of  instruction – say the bottom 7 bits – and this allows the alu to operate quickly on this subset  of memory without having to wait for a 2nd instruction that holds the address.

Dedicating the lower part of memory to an 8-bit payload, allows short literals to be encoded into the instruction – useful for supplying constants to the ALU, or for comparison operations – jump on match or similar.

 

Implementing SIMPL in hardware.

The end game of looking at cpu design is to attempt to find an architecture that is not only easy to implement, light in hardware and optimised to execute the SIMPL language efficiently.

As SIMPL is implemented in the form of a switch-case statement or hash-table, having an efficient “jump on match” instruction would be a useful asset – where an ascii character can be read from a buffer,  multiplied up by 32 or 64 and stored into the PC causing program execution to jump to the code body defined by that ascii character.

In hardware, decoding the 5-bit token into a longer instruction word, with various bit-fields to control the alu etc, would normally be done with a microcode ROM. In the simulator, this decode function is done with a series of switch-case statements.

For example typing “&” generates a 16 bit word by calculation or from a look-up table, which it loads into the next  RAM address and also prints the string AND to the terminal.

This process is repeated for all legitimate characters, and number literals, and so the RAM can be loaded with the machine code for the proposed processor.  As the 32 instructions will be a common “lingua franca” to all processors, it’s just a case of changing the “look up” to suit the machine language of the target processor.

Hardware versus Software

My interest in Minimum Instruction Set Computers (MISC), lies in particular trying to understand where the border line is drawn between a successful instruction set, and one that is too compromised to work efficiently.  Instructions can always be synthesised from primitives such as ADD and NAND, but this costs cpu cycles. As there is likely to be a hardware XOR function included as part of the full-adder in the cpu, is it not worth the additional instruction multiplex to make this available in the instruction set?

Historical Examples of Minimalisation

The PDP-5 and the more commercially successful PDP-8 were a miracle of minimal hardware design – as when introduced in 1965, a single transistor cost the equivalent of $20 in today’s money. Transistors were used very sparingly, only where necessary to provide signal inversion (logical NOT) and amplification (fan out). The Diode Transistor Logic (DTL) provided an easy mechanism to OR inputs and to AND outputs – so much was made of this feature to minimise the transistor count.

The PDP-8 ALU and Register slices (boards R210 and R211) contained 12 and 13 transistors respectively, and about 10 times that number of diodes.  These were built at a time before ICs were available cheaply, so every last ounce of value had to be extracted from a minimum transistor count. As a result the PDP-8 had only 8 instructions, but employed a trick by where the instruction OPR, which is 7xxx (octal) allowed individual manipulation of the ALU control signals to provide clear and complement and shifting operations. A second set of 7xxx instructions allowed various conditional jumps to be performed. As these operations could be performed in parallel by ORing the various control bits, this gave much greater flexibility to a machine  with apparently only 8 instructions – as follows

  • 000 – AND – and operand with AC.
  • 001 – TAD – add operand to <l,ac>(a 13 bit value).</l,ac>
  • 010 – ISZ – increment operand and skip if result is zero.
  • 011 – DCA – deposit AC in memory and clear AC.
  • 100 – JMS – jump to subroutine.
  • 101 – JMP – jump.
  • 110 – IOT – input/output transfer.
  • 111 – OPR – microcoded operations.

The PDP-8 was legendary for it’s minimalisation, and went on to sell more than 50,000 units over a period of 20 years, and in several hardware revisions – the last being a PDP-8 on a Harris 6120 VLSI chip based on gate arrays.  The PDP-8 has also been recreated in more modern hardware – both in discrete TTL and as a FPGA in VHDL.

A Flexible ALU

The ALU featured in the “NAND to Tetris” course uses 6 control lines to ingeniously control the inputs and outputs to an otherwise simple ADD/AND circuit. However, only 18 of the 64 combinations of instruction are actually used, and one bitslice the ALU contains  the equivalent of 35, 2-input NAND gates – which rapidly escalates to 560 gates for a 16-bit ALU.

Constraining the instruction set to just ADD or NAND halves the number of gates needed but equally shifts the problem over to firmware to synthesise the missing operations from the ADD and NAND primitives.

Before embarking on any hardware implementation it would be good to have the means to simulate  and explore possible instruction set architectures (ISAs), so the technique used to simulate EDSAC can be more widely applied.

Simulating a DIY CPU using SIMPL.

As most MISC machines are likely to have less than 32 instructions, the cpu model is likely to be relatively simple. As seen with EDSAC, the processor model can be constructed in fewer than 50 lines of code, as a case statement that decodes each instruction from memory in turn.

Populating an area of RAM with the correct instructions for the chosen processor can be done with the help of a text interpreter and a look-up table. I used  a modified version of SIMPL to act as an instruction loader/builder, where an instruction in the form nX was assembled into memory.  n is the address/data field of the instruction, and X is the op-code. SIMPL strips off X as a 4 or 5 bit code, and shifts it up to occupy the most significant bits of the instruction word, and places n as the lower 12-bits.

This operation was fast and reliable allowing pseudo-assembly language to be loaded directly into memory using Teraterm.

Once the RAM contains the correct machine instructions, we use the processor model to fetch the instructions one at a time from RAM,  decode them and execute the correct logical and arithmetical behaviour, updating registers, program counter and memory where necessary.

SIMPL was set up so that the “s” command could be used to single step through the instructions – giving a print out of the registers and key memory locations each step. This allowed for elementary debugging of the code.

Another SIMPL command “e” was used to preset the program counter to a given location – so that program execution could begin there.

Further commands and a more refined user interface are planned for a later date.

Great satisfaction was had having code actually run on the simulated EDSAC machine – using nothing more than a text editor, a terminal program and a MSP430 LaunchPad. Such low level computing really does not require sophisticated tools.

 

 

 

 

 

Posted in Uncategorized | 2 Comments

When you are strapped for resources……

In early August, I spent a few days on the Greek peninsula of Kassandra, to the south of Thessaloniki. In order to keep my mind stimulated and an excuse to keep out of the full-on daytime sun, I took along a minimum “Hacker’s Survival Kit”  –  plus my cheap laptop.

The survival kit consisted of a TI MSP430 Launchpad, a breadboard and a few components. I decided that with the “EDSAC Challenge” coming up in early September, that I really ought to try to work on an EDSAC based project – and with the minimum of resources, I reckoned that I stood a fair chance of being able to simulate the EDSAC using the LaunchPad, and possibly even write some simple programs in EDSAC machine language.

EDSAC was the first of the Cambridge University built computers that were designed for real computational work – in order to aid research. It was commissioned in spring of 1949, and ran it’s first demonstration program on May 6th 1949. This poster explains most of the early history of EDSAC.

EDSAC was built using about 3000 vacuum tubes (valves) and it had memory of 1024 words that was based on mercury delay lines. It performed about 650 instructions per second, using a serial ALU – but this represented a 150 times speed up from the use of desktop mechanical calculators that were commonplace at that time.

What was of interest to me was the limited instruction set of EDSAC – only 18 instructions, and how they could be put to use to provide real computation. I had read about Mininmal Instruction Set Computers, from the work by Charles “Chuck” Moore, and his work on Forth processors from the 1980s and 90s – and was intrigued to see what I could achieve on a primitive machine like EDSAC.

Simulating a cpu is fairly easy in C, you just need a series of case statements to handle all the possible instructions. Each case statement updates the accumulator and program counter of the cpu model – and for a machine like EDSAC with almost no registers, the cpu model is very easily described in about 50 lines of code:

//-------------------------------------------------------------------
 // mini-EDSAC CPU Model
 // ------------------------------------------------------------------
 
 static void execute(int instruction) // This is the EDSAC CPU model
 
 // The Alpha code is held in the top 5 bits of the instruction
 // The bottom 11 bits hold the address
 {
 insn = (instruction & 0xF800)/2048 ;
 _pc = pc + 1;
 
 n = (instruction & 0x7FF); // n is the memory address field 
 - lower 11 bits
 
 switch (insn) 
 {
 
 case 0: A += m[n] ; break;                    // ADD A
 case 1: A -= m[n] ; break;                    // SUB Subtract
 case 2: A += (m[n] & R) ; break;              // COL C (AND m[n] with R)
 case 3: m[n] = A ; break;                     // DEP Deposit D -no clear
 case 4: if(A>=0 && A<=32767) {_pc = n;} break;// JGT E
 case 5: break;                                // VER F
 case 6: if(A>=32768 && A<=65535) {_pc = n;} ; break; // JLT E
 case 7: R += m[n] ; break;                    // CPY H - Load R register
 case 8: break;                                // INP I 
 case 9: A = A>>n ; break;                     // Right Shift
 case 10: m[n] = A; A=0 ; break;               // Transfer and clear
 case 11: A = A<<n ; break;                    // LSH L
 case 12: A += (m[n] * R) ; break;             // Multiply and ADD
 case 13: A -= m[n] * R ; break;               // Multiply and Subtract- N
 case 14: uart_putc(m[n]); break;              // OUT O Output a character
 case 15: break;                               // PUT P
 case 16: break;
 case 17: A = m[n]>>1 ; break;                 // RHS R
 case 18: A -= m[n] ; break;                   // SUB S
 case 19: m[n] = A; A=0 ; break;               // TRC T 
 case 20: m[n] = A ; break;                    // UPD U
 case 21: A += (m[n] * R) ; break;             // ML+ V
 case 22: break;
 case 23: break;                               // NOP X
 case 24: break;                               // RND Y
 case 25: break;                               // END Z
 case 26: break; 
 case 27: break; 
 case 28: break; 
 case 29: break; 
 case 30: break; 
 case 31: break; 
 
 }


pc = _pc;
 instruction = m[pc];

next_t = _t;

}
 // End of mini-EDSAC CPU model
 // ---------------------------------------------------------

As can be seen, we have a fairly conventional Load-Store or Von Neumann architecture where most operations are performed on a memory location m[n] and the Accumulator A. There is a multiplier register R which is used for certain operations – including multiply and ADD and multiply and SUBtract.

In order to simplify the EDSAC model, it was decided to break with the conventional instruction mnemonics and make sure all of the 16 most useful instructions could fit into an instruction based on a 4-bit code.

Because of the way that integer numbers are handled by the MSP430G2553 on the LaunchPad, when defining the conditional jump instructions I had to put additional constraints to define what was positive and negative integers.

In order to program the mini-EDSAC model, it was necessary to load the instruction plus operand/address into an array of 16-bit words. I chose to do this using the SIMPL programming framework that was already running on the MSP430.

The only problem is that SIMPL is reverse Polish (post-fix) based – so needed the operand (number) first, followed by the operation.

With only a few minor changes to SIMPL I was able to load RAM with short programs and single step through the EDSAC code.

The first task was to get the EDSAC decimal integer print routine to work.  I had an example from the “Squares” program of how to print decimal integers – but despite this it still took me a day of debugging to get this to work.

Once I had the means to get numeric output from the mini-EDSAC simulator, I wrote a short program to add 123 to 456 and print out the answer as a decimal integer. This worked perfectly, and in just 272 machine cycles of the mini-EDSAC simulator, I had the correct answer, 579, printed to the terminal.

With the basics in place with the mini-EDSAC simulator, it’s time to port it to an MSP430 with more RAM – and to further explore the programming techniques of these very simple computational machines.

Of course the 16 instruction model could be extended – making use of more of the uppercase characters to act as instruction mnemonics.  Allowing a 5-bit instruction word, means 32 unique instructions – which gives a much greater flexibility of the machine.

Once I have mastered the EDSAC instruction set, it will be time to look at other architectures of Minimal Instruction Set Computers. (MISC).

 

 


Posted in Uncategorized | Leave a comment

Computer Science from the Ground Up.

 

Full_Adder_DTL_7

Imagine that you have just taken a 1 term course in Electronic Engineering and Computer Science and you are now facing the end of term exam:

Part 1 – Digital Logic.

1a. Using transistors, diodes and resistors, show how a 2-input NAND gate can be constructed. Show how the 2-inputs can be extended to allow for additional inputs.

1b. Using standard logic symbols show how the following gates may be synthesised from the NAND component.  AND, OR, XOR, NOR

2a. Using conventional logic symbols, show how a half adder my be created from gates devised in 1b. What is the minimum gate count?

2b. Show in a digram, how the half adder can be extended to a full adder, allowing for any additional external gatess that may be required.

3a.  Show the schematic for a 2:1 multiplexer using 2 input NAND gates

3b.  Show how the multiplexer may be extended to 3 input and 4 input. Include the gate count for each of your designs.

4a. A small cpu requires an arithmetical logic unit (ALU) capable of the following operations.

AND, OR, XOR, NOT, ADD, SUBTRACT

Produce a block diagram showing how this may be achieved from conventional logic gates. Pay specific attention to the detail of how each operation is selected.

4b. Explain the term “Bitslice” and using the design for the ALU in 4a. show how the logic may be partitioned into a bitslice design.

5a. A logic design requires a clocked D-type Flip Flop (DFF) – show how this may be created from NAND gates.  Pay particular attention to the method by which the DFF is loaded and clocked.

5b. A small cpu needs a program counter capable of addressing 64K words of memory. Using the DFF from 5a, and any other external logic – show how this may be achieved.

6a. Using the functional blocks described above, plus external memory such as RAM and/or ROM show how these elements may be interconnected connected to form a simple 16-bit cpu.

6b. Explain the operation of the cpu in terms of the sequence of events required to fetch, decode and execute an instruction from memory.

N2Tcover

Now, whilst I studied electronic engineering in the early 1980s, I too would find some of those questions quite a challenge, but as a result of the online course “From NAND to Tetris”, which introduces digital logic and computer systems in a series of heirarchical layers – demystifying each layer in turn. Accompanying the course are a series of course lecture notes, a text book “The Elements of Computer Systems (TECS – available online as pdf) plus hardware and software simulators that will run on any common laptop platform.

Starting from the basic NAND gate the course shows in 6 lectures how to create a complete 16-bit computer from logic “chips” designed using a hardware description language. The computer design, called “Hack” is backed up by a laptop based simulator, and capable of running real code.

With such a structured course, it makes it possible to learn the basics needed to answer the above questions in full in just six weeks of lectures and practical study workshops.

Whilst “NAND to Tetris” is a well structured course and has been designed to allow maximum access and exposure for self-learners to the course materials, the course is based upon an artificially created hardware description language, and a series of test scripts and simulators written in Java. This makes the course viable to the widest audience, who may be of limited resources, but it does not provide hands-on experience of real hardware.

Through my work with myStormmyStorm – the open source FPGA experiment board, I hope to show how “NAND to Tetris” can be used as the basis for a FPGA learning course – using low cost open source FPGA hardware and tools. With the “BlackIce” FPGA board – it will be possible to create a real “Hack” computer – and have the means to connect it to a variety of experiments.

How to introduce Electronics and Computer Science to undergraduate students, makers and hobbyists in a practical “learning by doing” manner, will be the subject of a my forthcoming presentation “Computer Science From the Ground Up”  for OSHCamp, at Wuthering Bytes, on Saturday 2nd September.

 

Posted in Uncategorized | 2 Comments

A simulator for “mini-EDSAC” coded in C

Back in May, I entered into correspondence with Bill Purvis, who is involved with the recreation of EDSAC at The National Museum of Computing, at Bletchley Park.

Bill has created a simplified model of EDSAC, which runs on a 1K Lattice iCE40 IceStick FPGA.

Here are the main differences between his mini-EDSAC and the original:

Comparison between EDSAC and Mini-EDSAC

Mini-EDSAC is a simplified version of the EDSAC instruction set, suitable for implementation on machine with limited capacity. The main differences are listed below:

  • EDSAC uses 35-bit words, Mini-EDSAC uses 16-bit words

  • EDSAC packs two 17-bit orders into a word, Mini-EDSAC has a single 16-bit order in each word.

  • EDSAC uses the 5-bit tape code devised at Cambridge, Mini-EDSAC uses 8-bit ASCII coding, but reduced to 5-bit for the order codes.

  • EDSAC has multiply-and-add and multiply-and-subtract orders which take the content of a multiplier register, multiply it by the operand, and add or subtract it into the accumulator. Mini-EDSAC simply multiplies the accumulator by the operand, placing the result in the accumulator.

  • The EDSAC accumulator is double length 70 bits, the Mini-EDSAC accumulator is 32 bits.

  • The Mini-EDSAC shift orders use the numeric value of the address part to specify how many bits are to be shifted, EDSAC has a more subtle interpretation.

  • EDSAC handles long (35-bit) and short (17-bit) operands, Mini-EDSAC only handles 16-bit operands, hence there is no need for the Long/Short bit in the order format, reducing it to 16-bits.

  • EDSAC input order takes 5-bits from the reader, Mini-EDSAC takes all 8 bits from the ASCII-coded input.

  • EDSAC outputs the top 5 bits of the operand to the printer, Mini-EDSAC outputs the bottom 8 bits of the accumulator.

In general, the appearance is much the same, operation codes match (apart from the encoding). The input programs look much like those of EDSAC programs, using the Initial Orders 1, except the last character of each order is not required.

One noticeable difference is the use of @ in pseudo-orders. EDSAC programs used P as that is encoded as zero. @ maps to zero when the top 2 high bits are removed.

The initial orders are slightly longer – I have not been able to condense them into 31 orders, programs load from address 37 onwards. I have not yet attempted to replicate David Wheeler’s Initial Orders 2!

Bill Purvis, 25th May, 2017

Bill’s mini-EDSAC was the starting point for my simulation – written in Arduino C.  It’s relatively straight forward to get a modern microcontroller to emulate these early machines – by recreating their instruction set in the form of a switch-case statement in C.

This code uses an array of 16-bit locations in RAM  m[n]

A is  the Accumulator

It masks off the instruction field “insn” –  as the top 5 bits

It masks off the address field “n” as the lower 5 bits

 

 //--------------------------------------------------------------------
 // mini-EDSAC CPU Model
 // -------------------------------------------------------------------
 
 static void execute(int instruction) // This is the EDSAC CPU model
 
 // The Alpha code is held in the top 5 bits of the instruction
 // The bottom 11 bits hold the address

 {
 insn = (instruction & 0xF800)/2048 ;
 _pc = pc + 1;
 
 n = (instruction & 0x7FF); // n is the address field - lower 11 bits
 
 switch (insn) 
 {
 
 case 0: A += m[n]           ; break; // ADD A
 case 1: A -= m[n]           ; break; // SUB B SuBtract
 case 2: A += (m[n] & R)     ; break; // COL C - logical AND
 case 3: m[n] = A            ; break; // DEP Deposit D -no clear
 case 4: if(A>=0) {_pc = n; }  break; // JGT E
 case 5:                       break; // VER F
 case 6: if(A<0) {_pc = n; } ; break; // JLT G
 case 7: R += m[n]           ; break; // CPY H - Load R register
 case 8: m[n]=uart_getc();     break; // INP I - get char from UART
 case 9: A = A>>n            ; break; // RHS J - Right Shift
 case 10: m[n] = A; A=0      ; break; // TRC K - Transfer and clear
 case 11: A = A<<n           ; break; // LSH L - Left Shift
 case 12: A += (m[n] * R)    ; break; // ML+ M - Multiply and ADD
 case 13: A -= m[n] * R      ; break; // ML- N - Multiply and SuBtract
 case 14: uart_putc(m[n])    ; break; // OUT O - Output character to UART
 case 15:                      break; // PUT P - PUT constant in memory

// These cases >= 16 are not used on 4-bit instruction model

 case 16:                      break; //     Q - Not Used
 case 17: A = m[n]>>1        ; break; // RHS R - Right Shift
 case 18: A -= m[n]          ; break; // SUB S - Subtract
 case 19: m[n] = A; A=0      ; break; // TRC T - Transfer and Clear Acc
 case 20: m[n] = A           ; break; // UPD U - Transfer - don't Clear
 case 21: A += (m[n] * R)    ; break; // ML+ V - Multiply and ADD
 case 22:                      break; //     W - Not Used
 case 23:                      break; // NOP X - No Operation
 case 24:                      break; // RND Y - Round Accumulator
 case 25:                      break; // END Z - Stop and ring Bell!
 case 26:                      break; 
 case 27:                      break; 
 case 28:                      break; 
 case 29:                      break; 
 case 30:                      break; 
 case 31:                      break; 
 
 }

// Note - the most useful instructions crammed into top 16 cases
// to allow for 4-bit instruction selection
// So those below "P" are moved up to fill vacant slots 
// At "B", "D", "K", "M"
// Model can be extended in future for 5-bit instruction codes

pc = _pc;
 instruction = m[pc];

next_t = _t;

}
 
 
 
 // End of CPU model
 // ------------------------------------------------------------------

Posted in Uncategorized | Leave a comment

SIMPL on EDSAC

SIMPL on EDSAC

When you start coding on a new processor architecture, you always have the initial problem of how to get some sort of a bootloader into memory, to allow you to then load your application program into memory.

Originally this was done manually using toggle switches, to set up the data word for a given address and then write that bit pattern directly into memory, incrementing the address each time. Naturally this bootloader had to be as short as possible – as toggling long sequences of raw data into memory is not only tedious but prone to errors.

The engineers who devised those early bootloaders were masters of minimalism – with EDSAC requiring only 31 instructions – a program called “Initial Orders” which was fed into memory using an arrangement of hard-wired uniselectors. A uniselector was a piece of telephone switch gear – a relay operated “stepping switch” which could be used to generate a fixed output pattern for a given input – effectively a ROM.

With “Initial Orders” loaded, EDSAC could then be loaded from a teletype paper tape reader – using 5-bit teletype code to represent instructions, numbers and characters.

Having pre-packaged routines to assist with the coding of a new processor  is always useful – and being able to take this “toolkit” of known, working routines from processor to processor – makes life so much simpler.  This was what Charles Moore (the inventor of Forth) effectively did – in the 1960s, when he was coding on many different mainframe architectures. It was through the need to have a portable language – regardless of the architecture of the machine, that Moore’s toolkit eventually evolved into Forth –  How this was achieved is described in his article – “Forth – The Early Years”

Inspired by Chuck Moore, I wanted to develop my own language framework, but I needed something to start me off in the right direction.  That incentive came at a talk I attended in May 2013 – where I learned of a very simple interpreter called TXTZYME – by Ward Cunningham.

Inspired by txtzype, I began to  develop a programming framework called SIMPL. (serial interpreted minimal programming language) – which I have ported to a variety of microcontrollers including MSP430, ARM and AVR.

SIMPL is best described as a character interpreter running in a while loop, with routines to handle numerical input and numerical output.

SIMPL interprets a string of characters, and performs actions based on the characters. It is one level above machine language, providing a framework and environment that is independent of the specific details of the cpu. In theory, once you have ported SIMPL – you have a common machine interface – and no need to deal with the quirky machine code anymore!

Porting SIMPL to EDSAC

Porting SIMPL to a new cpu architecture is an ideal way of gaining an understanding of that architecture and it’s machine code.  As EDSAC has very few useful instructions, I viewed the porting of SIMPL to EDSAC as something of a coding challenge.

Fortunately it has ADD and Subtract, and these can be used with the conditional jump instructions for branching and looping. There is also AND, which can be combined with a subtraction from FFFF to negate the bits to create NAND and from NAND we can generate OR, NOR and XOR.

The ability to string primitive instructions together as short sequences to make compound ones is something that can be readily automated by the SIMPL interpreter.

As I stated in the earlier post – it seems unusual to have a hardware multiplier on such an early machine – but Maurice Wilkes predicted that EDSAC would be used for real mathematical computation – so the multiplier was purposefully included right form the start.

The other problem is that the chosen target board – an MSP430 with just 512bytes of RAM has to host the version of SIMPL running on the virtual EDSAC machine – which means cramming a meaningful version of SIMPL into about 200 instructions – which are 16 bit words.

The point of this exercise is to make programming the EDSAC model a lot easier, than raw machine language, and also to get a feel for how a “language” can be created from an instruction set that lacks many of the basic addressing modes.

Bite the Bullet – Itch that Scratch – and so in two days of spare time whist on vacation, an EDSAC simultor was coded on Day 1 – and large chunks of SIMPL were coded on Day 2.

A Model of SIMPL.

SIMPL deals with character input and character output. This is handled with the get_char and put_char routines which communicate with a serial terminal program such as TeraTerm.

Every printable ASCII character is treated as a command. This needs a jump table of 96 entries to be able to jump to any code body address, having received a character from the serial input routine.

Numbers are treated slightly differently by a routine called “number” which reads each character in turn and assembles a 16-bit integer from the given input string.

Similarly there is an output routine “printnum” which takes a 16-bit number off the stack and generates a printable string from it.

In addition to these three routines, there are the action  routines associated with each command.  These consist of a basic set of primitive instructions – mostly represented by punctuation characters and symbols. Higher level functions are represented by lower case alpha characters, and upper case alpha characters are used for User Definable words. In this respect – SIMPL is a bit like Forth, but lacks the dictionary structure.

EDSAC has a pair of Jump instructions – but it lacks a subroutine Call and Return – as these devices were not invented until 1952 – 3 years after EDSAC first ran.  So EDSAC will have to rely on a Jump to the code body for a given character, and then a Jump back to the code that fetches the NEXT character from RAM.

SIMPL needs about 1Kbytes of program memory on the MSP430 – it’s about 250 instructions for the basic kernel.

The EDSAC instruction set is indeed very basic when compared to MSP430 – and so new methods will need to be devised to get the program structure to work. EDSAC code is entirely stored in RAM – so there is the means to self modify code (scary) – but think of this as a means to an end to create the necessary jump table structure needed by SIMPL.

EDSAC only has about 16 or 18 instructions – and they were represented by capital letters – making the assembly language a lot easier to read. I chose the top 16 instructions and wrote an EDSAC simulator in C to allow me to experience this lower level of computing.

SIMPL provided the communications framework, and allowed me to dedicate a few commands to controlling the  virtual machine, such as resetting the program counter and dumping the first 32 locations of RAM in an easily disassembled manner.

EDSAC predates ascii by about 14 years – so all of the communication and I/O was via a wartime surplus teletype that used 5-bit codes. For the modern simulation this may all be updated.  The instruction set – nominally some of the capital letters A through to Z may be simplified to 4 bit numerical opcodes 0-F.

So I derived an EDSAC virtual CPU model in a few lines of C code. This runs under Energia 17 – which is the MSP430 version of the  Arduino IDE

The EDSAC cpu model allows me to hack about with the instruction set,  and work out how exactly to port an interactive language to a resource limited cpu by creating a virtual machine.

Character and Number Input and Output

Character input and output is handled by the EDSAC’s I and O instructions which load a character from tape into a specific memory address and print a given character on the teletype.  For convenience these will interact with uart routines equivalent to get_char and put_char and allow communication from a terminal application.

print_num

The EDSAC literature also includes a listing of the code for printing out the squares of numbers from 0 to 100. This is useful because it features an integer print routine – which is also an essential routine when implementing SIMPL.

number

The “initial orders” program has a routine that tests each character from a sequence of consecutive characters to see if it is a number, and by a process of successive multiplication by 10, expands the input characters as  a decimal number in memory. This is equivalent to the “number” routine in SIMPL.

Jump Table

Fundamental to SIMPL is the jump table – identify a character in the input buffer and execute  a block of code associated with that character. Get this mechanism right – and SIMPL will be a breeze on EDSAC.

One trick with the EDSAC instructions is that both the instruction field and the address field can be manipulated via the accumulator and then deposited back in memory. This allows, for example, a jump table to be created  – based on the value of an input character.  This powerful technique is just what is needed to create an interactive language such as SIMPL – which really is one big jump table.

So we have ascii characters beginning at 32 – we need to create a jump table – so lets load the character into the bottom of the accumulator as an address of where to find the “jump”.  This has to be done in two steps – as we first have to create the pseudo instruction in memory.

T0 - clear the ACC
I2 - get  the character into m[2]
A2 - ADD m[2] into ACC
A3 - ADD m[3] which has been preloaded with the instruction for Jump
U4 - store the assembled instruction at m[4]
E4 - Jump to m[4] - which is now a jump to the jump table

 

Every character now has somewhere to jump to – including numbers, lower case, upper case and punctuation symbols. Numbers can be handled by a routine which looks at each character and creates an integer in the accumulator, which can then be stored on the top of the stack.

With the jump table – we have a mechanism for addressing the code that contains the primitives.

A register model

EDSAC does not use registers in the usual sense. Whilst there is a register to hold the multiplier  – for both the “multiply and add into accumulator” and the “multiply and subtract from accumulator”.  It is also used to hold one of the operands of the AND (collate) instruction.  The multiplier register can only be loaded from a memory and not directly through the accumulator.

However – selecting a small group of memory locations near the start of RAM and giving them nominal register names – will be a good way of creating some order – and a framework for the SIMPL implementation.  Instruction Pointer, stack pointer, return stack and program counter can be implemented thus.

 

Posted in Uncategorized | Leave a comment

Opening up EDSAC

EDSAC was the first of the Cambridge University computers, brought up in 1949 and intended for real scientific and mathematical work.

This year, there is considerable revival of interest in EDSAC – as the National Museum of Computing at Bletchley Park – have recreated a faithful reproduction of the original machine.  Also – the Open Source Hardware Users Group (OSHUG) have set up the EDSAC Challenge in September – which is to capture the flavour of EDSAC but using modern, open source FPGA hardware.

Why all this fuss about EDSAC – a 68 year old machine?

EDSAC was the first practical computer designed to do real work. It also exemplifies what a bit of ingenuity can do to an otherwise technically simple machine. It shows what can be done with the simplest of load-store architectures- and also teaches some of the tricks in making a hardware limited cpu do what you need it to do.

The EDSAC instruction set was based on the 5-bit teletype codes read in from a paper tape reader, that were common at that time as they were used for military and civilian communications.

Quite unusually EDSAC was given a multiplier instruction – somewhat simplified because of the serial nature of the memory and the arithmetic. EDSAC was based on the classic Von Neuman “Load-Store” architecture and as numerical precision was considered of high importance it used a 35 bit word length and the accumulator was 71 bits to accommodate the result of a 35×35 bit multiplication.

Instructions were designated capital letters – an early form of mnemonic, which made the program listings easier to read.  This feature is peculiar to EDSAC – and something that must have been lost when computers needed more than 26 instructions.

EDSAC used 18 instructions presumably tailored to the capabilities of the ALU and delay line memory.

What was it like to code on such a machine with such a limited instruction set?  I wanted to know – so I wrote an EDSAC virtual machine that will run on Arduino or MSP430 LaunchPad. From this I can then experiment with the architecture, of EDSAC and other unsophisticated processors.  This leads on nicely to my interest in Minimal Instruction Set Computers (MISC) – which by their nature are simple to model in code and use very little hardware.

So I created a simplified model – in C, which will run on an Arduino or MSP430 Launchpad. For practical coding reasons, 16 instructions will be used, which can be represented with a 4 bit instruction code.

Whilst EDSAC was created to help perform mathematical calculations – about 150 times faster than a 1940’s desktop mechanical calculator and so precision multiplication was deemed to be important,  the aim of the simplified model is to explore the limitations of a reduced instruction set, and especially when you are trying to create a virtual machine and an interactive programming environment with limited instructions.

From an instruction synthesis point of view, we have ADD, AND and SUBTRACT, so NEGATE, NAND, OR, and XOR should all be possible.

There are two jump instructions  Jump if A>=0 and Jump if A<0.  These  should be sufficient to meet most test situations.

A ADD
C COLLATE - AND
E JUMP IF A>=0
F VERIFY
G JUMP if A<0
H LOAD MULTIPLIER REGISTER
I LOAD INPUT CHARACTER
L LEFT SHIFT
N MULTIPLY AND SUBTRACT
O OUTPUT CHARACTER
P LOAD DATA
R RIGHT SHIFT
S SUBTRACT
T TRANSFER AND CLEAR
U TRANSFER NO CLEAR
V MULTIPLY AND ADD
X NO OP
Y ROUND ACCUMULATOR 
Z STOP AND RING BELL

The instruction set was simple and obviously dictated by the serial arithmetic unit. Multiplying iss a fairly straightforward operation with serial data – even on such a limited machine – the ability to multiply and add, or multiply and subtract was given high status.

EDSAC used about 3000 vacuum tubes for it’s logic – and memory consisted of mercury delay lines – where data existed as a series of standing waves that hd to be continuously refreshed.

This poster gives a really good overall view of the EDSAC machine – and has example code for “Initial Orders” which was a very early form of bootloader – in just 31 instructions. These instructions were “dialled” into memory using a system of hardwired uniselectors – as a lot of EDSAC was based on then current (General Post Office GPO) telecommunication hardware.

With the initial orders program running the main program could be loaded into memory from paper tape.

For the over curious – there is an EDSAC simulator here. Coding on such a primitive load-store architecture is certainly not a task for the fainthearted and takes some getting used to.

It’s easy to get hung up on historical details of this machine, so I created a cpu model, that behaves like EDSAC, but not necessarily constrained to the same teletype character codes or other hardware restrictions.

Perhaps it would be easier to impose a Register model on the cpu – so for example the first 16 locations in memory are given R0 to R15 names – that will certainly make it mentally easier to remember where variables have been stored.

On top of this register model – we can then lay out the virtual machine model, with data and return stacks, instruction pointer, top and 2nd, etc.

The code for the MSP430 is here on Github Gist . It consists of a simple instruction decoder and memory model. Instructions may be written in any text editor and then sent to the simulator using a terminal program – such as Teraterm.

 

 

 

//---------------------------------------------------------------------
// EDSAC CPU Model
// --------------------------------------------------------------------
// Instructions between Q and Z are replicated lower in the table
// This allows all instructions to be accessed with a 4 bit opcode
 
 static void execute(int instruction) // This is the EDSAC CPU model
 
 // The Alpha code is held in the top 5 bits of the instruction
 // The bottom 11 bits hold the address
 {
 insn = (instruction & 0xF800)/2048 ;
 _pc = pc + 1;
 
 n = (instruction & 0x7FF); // n is the address field - lower 11 bits
 
 switch (insn) 
 
 
 {
 
 case 0: A += m[n] ; break; // ADD A
 case 1: A -= m[n] ; break; // SUB Subtract
 case 2: A += (m[n] & R) ; break; // COL C
 case 3: m[n] = A ; break; // DEP Deposit D -no clear
 case 4: if(A>=0) {_pc = n; } break; // JGT E
 case 5: break; // VER F
 case 6: if(A<0) {_pc = n; } ; break; // JLT E
 case 7: R += m[n] ; break; // CPY H - Load R register
 case 8: break; // INP I 
 case 9: A = m[n]>>1 ; break; // Right Shift
 case 10: m[n] = A; A=0 ; break; // Transfer and clear
 case 11: A = m[n]<<1 ; break; // LSH L
 case 12: A += (m[n] * R) ; break; // Multiply and ADD
 case 13: A -= m[n] * R ; break; // Multiply and Subtract- N
 case 14: break; // OUT O
 case 15: break; // PUT P
 case 16: break;
 case 17: A = m[n]>>1 ; break; // RHS R
 case 18: A -= m[n] ; break; // SUB S
 case 19: m[n] = A; A=0 ; break; // TRC T 
 case 20: m[n] = A ; break; // UPD U
 case 21: A += (m[n] * R) ; break; // ML+ V
 case 22: break;
 case 23: break; // NOP X
 case 24: break; // RND Y
 case 25: break; // END Z
 case 26: break; 
 case 27: break; 
 case 28: break; 
 case 29: break; 
 case 30: break; 
 case 31: break; 
 
 }


pc = _pc;
 instruction = m[pc];

next_t = _t;

}
 
 
 
 // End of CPU model
 // ------------------------------------------------------------------

 

 

 

Posted in Uncategorized | Leave a comment

SIMPL Gets Simpler

This week I have been working on my tiny, Forth-like language SIMPL.  in a few hours of spare time I have managed to optimise some of the routines and get the code size down from around 900 bytes to just 780.

The main way that I achieved this was by reducing the size of the jump table from 196 bytes to 120 bytes, and using less code intensive ways to achieve the same functionality.

By paring away at the kernel in this way, makes for more room to add additional functionality  – and still stay within the target of just 1024 bytes.

SIMPL provides an extensible language framework allowing:
text entry – either from terminal or by file transfer
text output
32 basic instructions (primitives)
26 lower level functions – I/O related
arithmetical and logical operations
stack manipulation
I/O manipulation
Millisecond and microsecond timing operations
Looping
memory examination and modification
The kernel of SIMPL is extensible  but in it minimum operational state fits into roughly 800 bytes and has the following main routines
set_up         – sets up the clock, the UART and the GPIO
textRead    – Read a waiting character from the uart and store it into a buffer area in RAM. The location of where it is stored is determined whether we are in immediate mode or compile mode.   Immediate mode saves the character string into a text buffer at 0x0200.  Compile mode saves into a buffer starting at 0x0220 for “A” 0x0240 for “B” and so on. textRead finishes when it encounters a newline or carriage return character. It puts ascii 0x80 on the end of the buffer to signify the end of the input and sets the instruction pointer to point at the start of the text buffer (0x0200). It then jumps to “next”
next –  this is the key routine – central to the  whole operation of SIMPL.  It implements a very basic virtual machine that can work its way through an array  of characters in memory,  making decisions on their type and executing action routines based on the numeric ascii value of the character.   It gets the next character from the buffer and test to see whether it is a numeric character, an uppercase character or something else.
number – determine whether the character is part of a string of numerical characters and form into a 16-bit integer to be placed on the stack. Jumps back to “next” on completion
alpha – determine if a character is an upper case letter, calculate the code address associated with it and modify the instruction pointer to point to that address. Jumps back to “next”
jumper –  This looks up a jump address from a jump table based on the ascii value of the character. 60 table entries are stored – allowing for all the punctuation, arithmetic and logical symbols and the lower case letters.  Each of the 60 characters has an action routine associated with it which is executed when the character is detected in RAM.  All of the action routines jump back to “next” on completion
printnum   – take a 16-bit integer and convert it into an ascii text string for display
The above routines of the kernel can be coded into about 782 bytes on the MSP430 – allowing the additional routines to fit into about 240 bytes.   These will allow access to the peripherals including ADC, GPIO, Timers, SPI etc.

 

 

Posted in Uncategorized | Leave a comment

Minimal Computing – almost 70 Years On

Some of you will know that I am a proponent of minimal computing devices.

I have more than a passing interest in retro-computing, because after all, back in the day, hardware cost so much, – for example in 1963, in the PDP-5,  every transistor cost about $20 in today’s equivalent, and had to be justified.

Now that same $20 will buy 4  Pi-Zero dev-boards, with 1GHz ARM processors- needing about 20 million lines of code just to be useful.

So I have embarked on a journey of discovery and learning, to see if we can make computers a lot simpler.

Ancient History

The first real computers appeared in the late 1940’s, and filled a room with racks of power hungry vacuum tubes. My father attended Cambridge University as an undergraduate at this time – and probably unbeknown to him, computer history was being created daily by a group of dedicated mathematicians, engineers and technicians.

EDSAC was one of these early computers, and with 3000 tubes, and about 650 instructions per second was cutting edge back in May 1949.  With just 16 useful instructions, and memory based on ultrasonic delay lines EDSAC was created to do useful computational work – to assist Cambridge academics with their research.

Nearly 70 years later, the National Museum of Computing at Bletchley Park, is building a faithful reproduction of EDSAC – hopefully with it coming online later this year.

The Open Source Hardware Users Group – are having their own tribute to EDSAC, with the “EDSAC Challenge” – to be held as part of the Wuthering Bytes, Digital Festival, in Hebden Bridge, in early September.

EDSAC will be faithfully emulated using the latest open source FPGA technology – and teams of enthusiasts will work together to contribute meaningful demonstrations of what can be done with modern technology – as a tribute to the venerable EDSAC.

My own contribution will hopefully be a port of a tiny interactive language called SIMPL – which creates an interactive programming environment in just a few hundred instructions of assembly language.

Capable of arithmetic and logical operations,  looping, printing strings, and interpreting sequences  of  alphanumerical data – SIMPL will present a challenge for the computational resources of EDSAC.

Instruction Set.

EDSAC really only has about 16 useful instructions and 1K words of memory. This is going to cramp most programmer’s style.

However, if you examine the instruction set – the basics are there, but it will just take an extra bit of ingenuity to get the most out of them.

ADD

SUBTRACT

MULTIPLY

AND

JUMP IF >=0

JUMP IF <0

SHIFT LEFT

SHIFT RIGHT

TRANSFER ACC TO MEMORY AND CLEAR

TRANSFER ACC TO MEMORY DON'T CLEAR

TRANSFER MEMORY TO MULTIPLICAND REGISTER

MULTIPLY AND ADD

MULTIPLY AND SUBTRACT

INPUT A CHARACTER

OUTPUT A CHARACTER

NOP

ROUND ACCUMULATOR

STOP

 

Having just completed coding up my SIMPL interpreted language in just about 300 instructions in MSP430 assembly language – the challenge is now to code it up in EDSAC machine language – in under 1k words.

The next few posts will report on my progress – as I battle with the EDSAC simulator.

 

 

 

 

 

 

 

 

 

Posted in Uncategorized | Leave a comment

Looping in SIMPL – the mechanics of a tiny language

SIMPL
SIMPL is a character interpreter running within a loop – in pseudocode:
NEXT:             Fetch next Character (VM Instruction) from buffer
                         Decode to form unique Jump Address
                         Jump to associated code body, execute code – which ends with Jump back                            to NEXT
This forms the heart of a very simple virtual machine which is capable of executing unique code functions based on the character found in the memory buffer.  This elementary execution model is equivalent to most modern processor systems, so with a little bit of work, SIMPL can form the basis of a simulator for most basic processor architectures.
The character may be fetched from a serial terminal input buffer and executed immediately in an interactive manner, or the Instruction Pointer can point to a general area of program memory, and execute the character instructions found there.
It is this combination of using SIMPL either interactively at the terminal or as a means of entering and editing text into memory and subsequently running it from RAM, that makes SIMPL a powerful tool – to have in the programmer’s tool kit.
As a kernel, capable of executing the basic inner kernel loop,  SIMPL may be coded in about 300 instructions on most small processors such as the MSP430 – this includes the overhead of setting up the mcu oscillator, UART and GPIO – so that it supports serial terminal interaction.  As more functions are added, the code size grows by approximately 8 to 10 bytes per function – so it’s possible to have a small but usable SIMPL running in about 1K bytes.
The users application code will further extend this by perhaps a similar amount – but it still has an extremely small footprint, making it an ideal serial command shell for resource limited processors.
SIMPL when coded in C has been ported to a number of microcontrollers including ATmega, ATtiny, ARM M4, and MSP430.
The kernel has also been rewritten in MSP430 assembly language for speed and compactness, and work is ongoing to port it to the J1 Forth mcu.
The ability to encode the basic inner interpreter virtual machine in the native assembly language of a range of processors makes this a powerful technique of getting the basis of a useful “monitor” programme onto a variety of processors.
It also means that once the kernel has been coded in the native assembly language, then the user is working with a common set of instructions running on the virtual machine, at a level somewhat higher than the native assembly.
This creates a common language instruction set – a kind of “lingua franca” that makes interaction with a wide range of processors much easier.  It also opens up the door for different processors to communicate with each other through this common character encoded language.
SIMPL has some of it’s roots in Forth, and indeed Charles Moore’s Forth language has been a source of inspiration for the development of SIMPL.  However SIMPL is not a fully fledged language, more like a multi-function tool (Leatherman) which just makes the interpreted interaction with a range of small processors a lot easier, and less painful than the usual edit-compile-run cycle of code development found within a modern C compiler IDE.
Mechanics of SIMPL

Here’s some more info about the various routines that make up the kernel

textRead       33 Instructions 90 bytes
Receive characters from the serial uart and place them into a buffer in RAM.  Two modes of operation are possible, immediate mode, where the characters are executed as instructions directly after a carriage return line feed is received, and compile mode, where the character sequences are preceded by a colon and stored in pre-calculated command buffers in RAM.
number      16 instructions 42 bytes
Number interprets sequences of consecutive digits as a 16-bit integer number and places the value in the register that is the top entry of the stack.
next               5 instructions 12 bytes
Next is the routine that all commands return the program flow back to once they have executed. It fetches the next character instruction from the RAM buffer and through a jump table technique passes program control to the code body that performs the task associated with that instruction.  The  jump table  is used to direct the character to the areas of code that will treat them correctly – for example on encountering a numerical digit, program control is passed to the number routine, whilst for upper case alphabetical characters, program control is passed to a routine that handles these separately.
jump_table       96 instructions   192 bytes
Primitives         72 instructions 166 bytes
SIMPL uses a collection of 32 instruction primitives from which other instructions can be synthesised.   The code body of these primitives is extendable to some 110 instructions or so.
Upper (called alpha in V1)    10 instructions   26 bytes
Upper  handles the capital letters – as these are user commands, and the user is able to write and store in RAM certain functionality based on these characters.  When a capital letter is encountered in the instruction buffer, Upper directs control to the correct command buffer.
Lower   86 bytes
Lower is an area of program that interprets the lower case characters and provides a higher level of program complexity than is achievable form the primitives alone.
printnum     30 instructions   106 bytes
This takes a 16 bit integer number from the stack and prints a string of ascii digit characters to the terminal.
Uart Routines         13 instructions 41 bytes
Low level communication with the uart is by way of the get_c and put_c routines
Initialisation         19 instructions    90 bytes
Here the hardware such as the oscillator, GPIO and uart are initialised for correct operation.  This is code specific to whichever microcontroller has been chosen
Interpreter            4 instructions       16 bytes
This is the main routine that runs the SIMPL interpreter combining textRead, next, number and Upper.

 

Looping 

The loop structure is fundamental to the SIMPL language – but it is one of the trickier parts to code – but with much head-scratching, and a couple of false starts, I got it implemented today.

Loop Code.
Cracking the loop code – previously written in C, was key to getting SIMPL running in MSP430 assembly language. Very quickly after the  loop code start working –  comes LED flashing, tone generation, timing delays and printing character strings.
As it stands, with the loop-code in place,  SIMPL is 874 bytes long. I am confidant that we can get SIMPL into 1024 bytes – as per the Hackaday 1K challenge.
The SIMPL loop code is given a single value k – the loop counter which is placed on the stack immediately before the loop-start character which is a left bracket (.
On finding the left bracket, we need to make a note of the current instruction pointer when we enter the loop – so we have a location to jump back to each time around the loop.
We use the temporary registers R7 and R8 to hold the loop parameters  – i.e. the start of loop and the loop count
The control is then passed back to the interpreter by $NEXT  (jmp next) which steps through the instructions (in the loop) until the right bracket is found.
We then decrement and test the loop counter to see if it is zero – at which point the loop is terminated. If non-zero,  we return the instruction pointer to the start of the loop before jumping back to next.
Confused ?- you will be – but it really is that simple! Just 7 instructions and we are looping.
left_par:                ; code enters here on getting a left parenthesis

MOV.W tos,R8             ; save tos to R8 (R8 is the loop counter k)
MOV.W ip,R7              ; loop-start = ip the current instruction pointer at start of loop
JMP next                 ; get the next character and execute it

right_par:               ; code enters here on getting a right parenthesis

DEC.W R8                 ; decrement loop counter R8
JEQ #next                ; and check for zero - terminate loop
MOV.W R7,ip              ; set instruction pointer back to the start of the loop
JMP next                 ; keep looping

 

SIMPL now runs on a 16MHz MSP430 in just 872 bytes – with serial communications at 115200 baud. Most of the primitive instructions execute in about 1uS  – allowing a port pin to be toggled at 1MHz.

I have included the latest assembly source file at this Github Gist

 

Posted in Uncategorized | Leave a comment

More About SIMPL

SIMPL – a small Forth Like Language

Charles (Chuck) Moore, the inventor of the Forth computer language is one of my all time computing heroes.

In a life long career in computing, starting in the early 1950’s – he has been mathematician, programmer, hardware engineer and designer of several unique processors that execute the Forth language directly as their native instruction set.

Last year, I was fortunate enough to have the opportunity to meet him at Forth Day – at Stanford University, in November 2016 – and he has been an inspiration for some of my recent work.

I first came across FORTH as a teenager in the early 1980’s when the home computer was dominated by the BASIC programming language, and it fascinated me that Forth offered a radically different alternative.

It’s taken me the last 35 years to get around to understand what makes Forth tick – as part of my mid-life computing crisis, and as such I have taken time out to learn some of the fundamentals of this fascinating language.

Chuck Moore always looks for ways to simplify the task in hand – and it is his search for simplicity that has led me to some recent developments with a tiny programming language – that I call SIMPL.

SIMPL stands for Serial Interpreted Minimal Programming Language – and it was initialy inspired by Ward Cunningham’s Txtzyme – for Arduino or Teensy.  I took Txtzyme, added several new commands, and made it an extensible language, which could be ported to virtually any microcontroller.

SIMPL was originally written in C for Arduino, and then MSP430 and ARM. Now I am exploring writing the core routines in MSP430 assembly language for speed, efficiency and compactness.

The extended version of SIMPL – called SIMPLEX, includes a tool-kit to allow editing, loading, compiling and examination of memory – all  hosted by the target processor – in about 5K bytes.

MISC Processors and eForth

 

Inspired by Chuck Moore’s Minimal Instruction Set Computer (MISC) designs of processors from the mid 1980’s, I have chosen a similar approach. At the same time that Chuck Moore was working on designing MISC processors in gate arrays,  Bill Meunch  and Dr. Chen Hanson Ting were developing eForth – a complete forth that could be synthesised from only around 30 primitive instructions.

Meunch and Ting showed that once these 30 primitives had been coded in the native assembly language of the target processor, plus a porting of the inner and outer interpreters, then the whole eForth language could be created very quickly.

Ting once remarked that to put eForth on any new microprocessor, starting from scratch generally took him about 2 weeks.

A Comfortable Computing Environment

eForth creates a comfortable computing environment within any new, unfamiliar processor,  – kind of like establishing a base-camp in a hostile terrain. Additionally, once you have studied the processor architecture and instruction set sufficient to write the eForth primitives in it’s assembly language – you have probably become quite intimately conversant with it’s inner workings.

Forth is an extensible interactive language, which provides a complete computing environment hosted by the target machine. It dispenses with the need for sophisticated compiler tool chains or  IDEs, and allows all coding and interaction to be done from a simple serial terminal.  At the same time it provides an editor, loader, compiler, and the means to examine memory by way of a hex dump.

Registers and memory mapped I/O can be poked or examined interactively, and the low level drivers for I/O can be developed by experimentation from the ground up.  Initially the only I/O requirements are to allow a character to be sent or received via the serial UART – and virtually every modern microcontroller has at least one UART.

eForth however is some 6K to 8K bytes in a typical implementation, so I asked myself whether there was a way of creating an extensible language from an even smaller starting point – and so the idea of SIMPL came into being.

The Mechanics of Interpreted Languages

eForth, SIMPL and other tiny Forth like languages rely on efficiently coding a small set of primitive instructions onto a stack-based, 16 bit virtual machine, running in the native assembly language of the host processor.

The virtual machine consists of an inner interpreter that fetches the next instruction token from memory, performs a decode operation (either by jump table or switch-case structure) and executes the code  associated with that operation. The efficient coding and execution of the primitives and the inner interpreter is what gives these threaded, interpreted language their speed.

Ideally there will be fewer than 32 primitives, so in a hardware cpu implementation they may be represented by a 5 bit token, and mapped directly into the cpu machine instructions – this is the approach that Chuck Moore used with his Forth processors, and has been widely been adopted since – such as James Bowman’s J1 Forth CPU.

However in a software implementation we need to encode these in the native assembly language of the target processor, be it an 8-bit, 16-bit or 32 bit processor. Forth was developed at a time when 16-bit minicomputers were just appearing – and so traditionally it has used a 16 bit integer word size.  The late 1970’s and early 80’s saw Forth being adapted to 8-bit microprocessors – some of which were a good fit to the Forth virtual machine model – and some were not.

An example of a tiny Forth for 8-bit AVR

In one very compact implementation – T. Nakagawa’s “Tiny Forth” written for the 8-bit AVR, 25 primitives were encoded into just 286 instructions – an  average of about 11 machine language instructions per primitive.

Despite the limitations of an 8-bit architecture – Nakagawa’s AVR assembly language implementation is very efficient in about 840 instructions of AVR code and 137 bytes of tables – this assembles to around 1.8K bytes.

Typically the cpu does the following pseudo-code by fetching the next primitive instruction from memory,  testing it’s token value and branching to the next in the table if it’s not a match.

In hardware this is equivalent to the instruction fetch and decode.  Getting this process to run efficiently in assembly language is key to the speed of the resulting language.

Test the token character of the primitive
Is it the one we are looking for - no, branch to next in the table
Get the operands off the stack and into a pair of working registers
Perform the operation
Put the result back onto the stack
Correct the stack pointer
Return to the inner interpreter

Here is the AVR implementation of ADD

prim6: ; +              ; This is the ADD primitive with token=5
 cpi r16, 5             ; Is the token=5?
 brne prim7             ; No - branch to next primitive
 ld r19, X+             ; get operand 1 off the stack
 ld r18, X+
 ld r17, X+             ; get operand 2 off the stack
 ld r16, X+
 add r16, r18           ; add the low bytes
 adc r17, r19           ; add the high bytes plus any carry
 st -X, r16             ; Put the result back on the stack
 st -X, r17
 ret                    ; Return to the inner interpreter

In an 8 bit processor such as the AVR,  it takes 4 instructions to load up the working registers with two 16 bit operands (Forth uses a 16-bit integer size), and two instructions to perform arithmetical or logical operations on them. Passing the result back to the stack is another two instructions followed by a return.  With the overhead of the instruction decode and return, the basic arithmetical and logical operations are typically 11 instructions.   So in this light, an 8-bit Forth generally runs at about one tenth or less of the instruction rate of the processor.   The decode operation takes 2 instructions – even if there is no-match, so the frequently used primitives should be put closer to the start of the list.

Nakagawa’s Tiny Forth is concise and well documented – and I have reproduced it in this Github Gist

Whilst the coding of the primitives takes up around a third of the total program, the thirty short routines to implement the inner and outer interpreter, and the support routines such as serial input and output, multiplication and division take up the remaining two thirds.

With SIMPL, I strive to reduce all of these overheads, and this means an implementation using a 16-bit microcontroller – which is fundamentally a far better match to the 16-bit Forth Virtual Machine.

The MSP430 – A good match to SIMPL.

The MSP430 has been around for many years, originally developed by Texas Instruments in 1993 as a low power, mixed signal, RISC processor with a von Neuman architecture. It has an un-complicated 16 bit register set, and a very wide range of peripheral modules, including 24 bit ADCs, Low Energy Accelerators for FFT and DSP math operations and a large selection of standard peripherals.

In recent years it is one of few processors to have on-chip non-volatile FRAM – which has a much faster writing speed than Flash, and with 10E14 write cycles – virtually indestructible.

For a 16-bit processor, such as the MSP430, 16-bit integer maths and register operations are single machine instructions.

The ADD primitive becomes:

add: add @stack +, tos    ; Add 2nd stack member to the top of stack
     $NEXT                ; Call macro that executes the NEXT instruction

Clearly this increases the efficiency of executing the primitives and reduces the code space requirements, provided that the overheads of decoding an instruction token, and then fetching the next virtual machine instruction are kept to a minimum.

 

 

Simplifying Forth

Forth uses the concept of a “Dictionary” to organise the Words or subroutines.  The Words are written in natural language, and the dictionary search function normally uses the first three characters of the word and the length in order to find that word within the dictionary.  This is a powerful and flexible approach, but adds quite a significant coding overheads to the outer interpreter.

The approach I use in SIMPL was inspired by Ward Cunningham’s Txtzyme Interpreter – where just matching a single character using  switch-case statements causes the code associated with that character to be executed.  Written in assembly language this can be done with a hash-table,  or just by jumping to a program address that has been calculated in some way from the ascii character value.

Critics might say that this approach limits you to only a few dozen unique commands – and yes it does, but if your application needs more than this – perhaps it’s worth rethinking how you are going to solve your application.

As Einsten said “Make things as simple as possible, but not simpler”

There are 85 printable ascii codes between  32 and 127,  plus 10 numeric digits.  If we group these into

26 upper case characters

26 lower case characters

33 symbols and punctuation characters

SIMPL uses most of the symbols and punctuation characters as primitive instructions – such as &  +  !  % etc.   The lower case characters are used for invoking higher level commands – synthesised from the primitives. These were influenced by Txtzyme, tend to be more microcontroller specific – and may include commands to exercise the I/O and peripherals, read or write to a digital port, get a reading from and ADC or provide  hex dump of memory.  Finally are the uppercase characters which are reserved for the Users application commands – and provide the mechanism by which the language is extended.

 

The Primitive Instructions
Most of the primitive Forth words are located in the ASCII characters 32 to 63 that leaves 64 to 126 available for the users vocabulary and constructs made from primitives.

There are approximately 32 printable ascii punctuation and symbol characters – and with a bit of pre-processing they can be used to form the basic machine primitives.

Stack Commands (7) PUSH, POP, DUP, DROP, SWAP, OVER, NOP

, PUSH
. POP
" DUP
' DROP
$ SWAP
% OVER
SP NOP

Maths & Logical Operations (9) ADDC, SUB, MUL, DIV, AND, OR, XOR, NOT, SHR 

+ ADDC
- SUB
* MUL
/ DIV
& AND
^ XOR
` SHR
| OR
~ INV (NOT/COMPL)

MUL and DIV my be substituted for Shift Left and Shift Right 

Transfer Instructions (8)

: CALL (COMPILE)
; RET
( LOOP-START
) LOOP-END
< LESS
= EQU
> MORE
\ JUMP - condition follows

IN/OUT (2)

? KEY (INPUT)
. PRINT (OUTPUT)

Memory Instructions (3)

@ LOAD
! STORE
# LIT

Others (5)

_ _ String Print
[ ] String Store
{ } Array of elements - switch/case

 

The use of single ascii characters as tokens, makes the language a lot less verbose than Forth, and snippets of source code can be sent in very few characters – for example as an SMS message between 2 systems equipped with GSM modems.

It is a fairly simple task to have a word table – where the verbose form of the words are stored, and can be printed out, to expand the source into something more readable – for example

“‘% -> DUP DROP OVER

Three ascii characters expanded to 11 plus 3 spaces

SIMPL does not use the space as word separator, but in the special case where you have several numbers to put on the stack, the space is used to indicate “push onto stack”

The SIMPL Virtual Machine

SIMPL is based around a virtual stack machine that provides an interactive programming environment.  SIMPL primitive instructions are effectively the native assembly language of that virtual machine, and have been chosen to provide a compact but versatile instruction set.

Ultimately, I wish to create an open-core processor using FPGA devices that can execute SIMPL code directly – but in the meantime I need to rely on a virtual machine written in MSP430 assembler to do this for me.

SIMPL is Forth-like in that it uses words or tokens that cause blocks of code to be executed sequentially. It differs from Forth in that the words consist of single ascii characters that are decoded directly causing the cpu to call a routine at a given address, execute the subroutine found there and then return control to the inner interpreter.
The inner interpreter is very compact on the MSP430 with the main interpreter loop fitting into about 300 bytes. On top of this interpreter loop, you have the code for the low level primitive symbols – listed below, and then higher level words – such as “toggle a port pin” “n milliseconds delay” “read an adc channel” “output a string to serial port” “produce a hex dump of memory” which are represented by the lower case alphabet characters.

This layer of the language is what I refer to as the “Arduino Layer” – all those useful helper functions that are processor specific and deal with timing and I/O peripherals.

The next layer is the users application code where the users words use captital letters. For example – the classic washing machine program example:

FILL
WASH
EMPTY
FILL
RINSE
EMPTY
SPIN

This would be condensed in SIMPL as FWEFRES
– which is a code reduction factor of >5

But each of thes can have a parameter – for example if the WASH is an AGITATE cycle A

– In Forth

: WASH 50 DO AGITATE LOOP;

In SIMPL we also use a colon definition to define W. In fact the ascii code for W is used to store this code snippet at a given address – calcuated as (W-61)*32. So as soon as the interpreter encounters W, it jumps to that code address

:W50(A) the code in the parenthesis is repeated 50 times.

 

In Conclusion

SIMPL is a compact, extensible, interactive language that provides a minimum tool-kit to allow small applications to be developed on microcontrollers.

It provides a development environment, which may be ported easily to virtually any microcontroller with very low overheads.

The use of single ascii characters as primitive instructions and commands makes it compact but retains a certain level of human readability – that some machine languages do not.  As there is a one to one substitution between the ascii character and the Word – a translation table may easily be created to expand each symbol out to its full text representation.

SIMPL can act as a Lingua Franca between widely different classes of computing machines, or as messaging between several specialised processors on the same board . A lot of meaning can be conveyed in a few characters, and so a simple microcontroller could communicate tasks between dedicated co-processors in just a few characters.

At it’s heart, SIMPL has a text interpreter that reads character strings from memory and translates these into machine actions.  This is a very powerful technique applicable to all sorts of everyday tasks such as desktop manufacturing, CNC milling/routing, 3D printing, laser cutting, graphics drawing etc.

 

 

 

Posted in Uncategorized | Leave a comment