Exploring DIY CPUs

In the last couple of posts I described how I have used a TI MSP430 Launchpad to emulate the 1949 Cambridge University EDSAC machine.

This opens up the possibility of using a similar scheme to emulate virtually any DIY cpu, possibly with a novel or experimental instruction set.

The MSP430 Launchpad was used only because I had it at hand, but the cpu model and the SIMPL communications framework, being written in C, could be ported to almost any microcontroller – especially if they are supported by the ever-growing Arduino IDE, now with ARM Cortex M0, M1 and M4 extensions.

A Look at an instruction set for a MISC Processor.

As I am a proponent of MISC (minimum instruction set computers), the task of simulating the cpu is simplified to just 32 instructions – represented by a 5-bit token, and I have chosen standard ascii characters to represent my instruction primitive. EDSAC used 18 capital letters to represent its instructions, with a mnemonic flavour, but any printable character is appropriate to represent an instruction primitive.

These primitives represent various ALU instructions, data moves between the ALU and memory or registers, and various input-output instructions.

I have attempted to pick a mix of instructions that are easy to implement, useful and offer flexibility.  Whilst anyone can have aspirations about designing the ISA of a new processor, it is likely to have a minimum number of instructions to be useful.

This number was estimated to be around 25 in the 1990s – by Bill Meunch, Chen Hanson Ting Chuck Moore and others – working MISC and Forth cpu implementations. Previous cpus have successfully use 16 or even 8 instructions (see below).

From 25 primitive instructions,  more complex instructions can b e synthesised by assembling groups of these primitives into macros.

ALU Group                 NOP, ADD, SUB, AND, OR, XOR, INV,  SHL, SHR

Call/Jump Group       CALL, JMP, JEQ, JGT, JLT, RET

Memory Group          FETCH, STORE

Stack Operations      DUP, DROP, SWAP, OVER, PUSH, POP

I/O Group                    IN, OUT, EMIT, KEY     (emit is putchar and key is getchar)

Register Transfers   TO-R, FROM-R

Program Flow            FOR, NEXT

Thus we have 31 instructions that are tailored towards a stack based processor. For a more conventional register processor, or load-store architecture there would probably be more instructions for register transfers and richer addressing modes.

It could also be argued that all of the conditional jumps might not strictly be required, as a jump if zero can implement all program control structures.  Additionally if the processor has memory mapped peripherals then IN and OUT are practically redundant.

Deciding which instructions to keep and which to lose is all part of the mental anguish process of creating a the ISA of a DIY cpu.

Having decide on a set of instruction primitives, and the character tokens to represent them, it’s a case of looking at the overall processor architecture.

One cycle per instruction would be good – always a design aim for a simple processor, and also whether the use of dual port RAM can allow a partial pipeline – such that the ALU is calculating the ALU result(s) whilst the instruction is still being decoded.  This is the approach James Bowman has taken in his J1 Forth Processor.

Stack Processors generally use two stacks  – the data stack and the return stack – but there are other classes of stack machines – see Koopman’s Book on Stack Processors

These stacks can be implemented in main memory, or as a closely coupled register file. The stacks will have their own stack pointers and this is why the TO-R and FROM-R instructions are present to allow the TOP of stack to communicate with R, the Return Stack Pointer. There also needs to be a hardware mechanism to put the current Program Counter onto the return stack, before executing a CALL.

In SIMPL, there are really only two main registers x and y to hold the operands for the ALU.  There is also a loop counter k which is decremented each time around the loop until it zeros, and very occasionally there is need to access another variable.

It could be argued that 16 registers in a stack would be sufficient for most purposes, and at times four would be more than enough. As registers are cheap in FPGAs, then it may be worth considering a hybrid architecture, where the stack is a set of registers – any of which are addressable. This is a deviation from the pure TOP/NEXT architecture of a traditional stack processor – but it extends the processor functionality quite considerably, bringing it in line with many small commercial processors that have a rich set of registers.  And in the virtual machine to execute SIMPL, we need an Instruction Pointer, Program Counter, Return Stack Pointer, Data Stack Pointer, Loop Counter, Top and Next of stack – so 7 registers already.

Having more registers gives greater flexibility, but at the cost of needing more instructions to address them.  One compromise might be the idea of “zero page RAM” – where a limited area of memory has its address coded into the lower part of  instruction – say the bottom 7 bits – and this allows the alu to operate quickly on this subset  of memory without having to wait for a 2nd instruction that holds the address.

Dedicating the lower part of memory to an 8-bit payload, allows short literals to be encoded into the instruction – useful for supplying constants to the ALU, or for comparison operations – jump on match or similar.

 

Implementing SIMPL in hardware.

The end game of looking at cpu design is to attempt to find an architecture that is not only easy to implement, light in hardware and optimised to execute the SIMPL language efficiently.

As SIMPL is implemented in the form of a switch-case statement or hash-table, having an efficient “jump on match” instruction would be a useful asset – where an ascii character can be read from a buffer,  multiplied up by 32 or 64 and stored into the PC causing program execution to jump to the code body defined by that ascii character.

In hardware, decoding the 5-bit token into a longer instruction word, with various bit-fields to control the alu etc, would normally be done with a microcode ROM. In the simulator, this decode function is done with a series of switch-case statements.

For example typing “&” generates a 16 bit word by calculation or from a look-up table, which it loads into the next  RAM address and also prints the string AND to the terminal.

This process is repeated for all legitimate characters, and number literals, and so the RAM can be loaded with the machine code for the proposed processor.  As the 32 instructions will be a common “lingua franca” to all processors, it’s just a case of changing the “look up” to suit the machine language of the target processor.

Hardware versus Software

My interest in Minimum Instruction Set Computers (MISC), lies in particular trying to understand where the border line is drawn between a successful instruction set, and one that is too compromised to work efficiently.  Instructions can always be synthesised from primitives such as ADD and NAND, but this costs cpu cycles. As there is likely to be a hardware XOR function included as part of the full-adder in the cpu, is it not worth the additional instruction multiplex to make this available in the instruction set?

Historical Examples of Minimalisation

The PDP-5 and the more commercially successful PDP-8 were a miracle of minimal hardware design – as when introduced in 1965, a single transistor cost the equivalent of $20 in today’s money. Transistors were used very sparingly, only where necessary to provide signal inversion (logical NOT) and amplification (fan out). The Diode Transistor Logic (DTL) provided an easy mechanism to OR inputs and to AND outputs – so much was made of this feature to minimise the transistor count.

The PDP-8 ALU and Register slices (boards R210 and R211) contained 12 and 13 transistors respectively, and about 10 times that number of diodes.  These were built at a time before ICs were available cheaply, so every last ounce of value had to be extracted from a minimum transistor count. As a result the PDP-8 had only 8 instructions, but employed a trick by where the instruction OPR, which is 7xxx (octal) allowed individual manipulation of the ALU control signals to provide clear and complement and shifting operations. A second set of 7xxx instructions allowed various conditional jumps to be performed. As these operations could be performed in parallel by ORing the various control bits, this gave much greater flexibility to a machine  with apparently only 8 instructions – as follows

  • 000 – AND – and operand with AC.
  • 001 – TAD – add operand to <l,ac>(a 13 bit value).</l,ac>
  • 010 – ISZ – increment operand and skip if result is zero.
  • 011 – DCA – deposit AC in memory and clear AC.
  • 100 – JMS – jump to subroutine.
  • 101 – JMP – jump.
  • 110 – IOT – input/output transfer.
  • 111 – OPR – microcoded operations.

The PDP-8 was legendary for it’s minimalisation, and went on to sell more than 50,000 units over a period of 20 years, and in several hardware revisions – the last being a PDP-8 on a Harris 6120 VLSI chip based on gate arrays.  The PDP-8 has also been recreated in more modern hardware – both in discrete TTL and as a FPGA in VHDL.

A Flexible ALU

The ALU featured in the “NAND to Tetris” course uses 6 control lines to ingeniously control the inputs and outputs to an otherwise simple ADD/AND circuit. However, only 18 of the 64 combinations of instruction are actually used, and one bitslice the ALU contains  the equivalent of 35, 2-input NAND gates – which rapidly escalates to 560 gates for a 16-bit ALU.

Constraining the instruction set to just ADD or NAND halves the number of gates needed but equally shifts the problem over to firmware to synthesise the missing operations from the ADD and NAND primitives.

Before embarking on any hardware implementation it would be good to have the means to simulate  and explore possible instruction set architectures (ISAs), so the technique used to simulate EDSAC can be more widely applied.

Simulating a DIY CPU using SIMPL.

As most MISC machines are likely to have less than 32 instructions, the cpu model is likely to be relatively simple. As seen with EDSAC, the processor model can be constructed in fewer than 50 lines of code, as a case statement that decodes each instruction from memory in turn.

Populating an area of RAM with the correct instructions for the chosen processor can be done with the help of a text interpreter and a look-up table. I used  a modified version of SIMPL to act as an instruction loader/builder, where an instruction in the form nX was assembled into memory.  n is the address/data field of the instruction, and X is the op-code. SIMPL strips off X as a 4 or 5 bit code, and shifts it up to occupy the most significant bits of the instruction word, and places n as the lower 12-bits.

This operation was fast and reliable allowing pseudo-assembly language to be loaded directly into memory using Teraterm.

Once the RAM contains the correct machine instructions, we use the processor model to fetch the instructions one at a time from RAM,  decode them and execute the correct logical and arithmetical behaviour, updating registers, program counter and memory where necessary.

SIMPL was set up so that the “s” command could be used to single step through the instructions – giving a print out of the registers and key memory locations each step. This allowed for elementary debugging of the code.

Another SIMPL command “e” was used to preset the program counter to a given location – so that program execution could begin there.

Further commands and a more refined user interface are planned for a later date.

Great satisfaction was had having code actually run on the simulated EDSAC machine – using nothing more than a text editor, a terminal program and a MSP430 LaunchPad. Such low level computing really does not require sophisticated tools.

 

 

 

 

 

Advertisements
Posted in Uncategorized | 2 Comments

When you are strapped for resources……

In early August, I spent a few days on the Greek peninsula of Kassandra, to the south of Thessaloniki. In order to keep my mind stimulated and an excuse to keep out of the full-on daytime sun, I took along a minimum “Hacker’s Survival Kit”  –  plus my cheap laptop.

The survival kit consisted of a TI MSP430 Launchpad, a breadboard and a few components. I decided that with the “EDSAC Challenge” coming up in early September, that I really ought to try to work on an EDSAC based project – and with the minimum of resources, I reckoned that I stood a fair chance of being able to simulate the EDSAC using the LaunchPad, and possibly even write some simple programs in EDSAC machine language.

EDSAC was the first of the Cambridge University built computers that were designed for real computational work – in order to aid research. It was commissioned in spring of 1949, and ran it’s first demonstration program on May 6th 1949. This poster explains most of the early history of EDSAC.

EDSAC was built using about 3000 vacuum tubes (valves) and it had memory of 1024 words that was based on mercury delay lines. It performed about 650 instructions per second, using a serial ALU – but this represented a 150 times speed up from the use of desktop mechanical calculators that were commonplace at that time.

What was of interest to me was the limited instruction set of EDSAC – only 18 instructions, and how they could be put to use to provide real computation. I had read about Mininmal Instruction Set Computers, from the work by Charles “Chuck” Moore, and his work on Forth processors from the 1980s and 90s – and was intrigued to see what I could achieve on a primitive machine like EDSAC.

Simulating a cpu is fairly easy in C, you just need a series of case statements to handle all the possible instructions. Each case statement updates the accumulator and program counter of the cpu model – and for a machine like EDSAC with almost no registers, the cpu model is very easily described in about 50 lines of code:

//-------------------------------------------------------------------
 // mini-EDSAC CPU Model
 // ------------------------------------------------------------------
 
 static void execute(int instruction) // This is the EDSAC CPU model
 
 // The Alpha code is held in the top 5 bits of the instruction
 // The bottom 11 bits hold the address
 {
 insn = (instruction & 0xF800)/2048 ;
 _pc = pc + 1;
 
 n = (instruction & 0x7FF); // n is the memory address field 
 - lower 11 bits
 
 switch (insn) 
 {
 
 case 0: A += m[n] ; break;                    // ADD A
 case 1: A -= m[n] ; break;                    // SUB Subtract
 case 2: A += (m[n] & R) ; break;              // COL C (AND m[n] with R)
 case 3: m[n] = A ; break;                     // DEP Deposit D -no clear
 case 4: if(A>=0 && A<=32767) {_pc = n;} break;// JGT E
 case 5: break;                                // VER F
 case 6: if(A>=32768 && A<=65535) {_pc = n;} ; break; // JLT E
 case 7: R += m[n] ; break;                    // CPY H - Load R register
 case 8: break;                                // INP I 
 case 9: A = A>>n ; break;                     // Right Shift
 case 10: m[n] = A; A=0 ; break;               // Transfer and clear
 case 11: A = A<<n ; break;                    // LSH L
 case 12: A += (m[n] * R) ; break;             // Multiply and ADD
 case 13: A -= m[n] * R ; break;               // Multiply and Subtract- N
 case 14: uart_putc(m[n]); break;              // OUT O Output a character
 case 15: break;                               // PUT P
 case 16: break;
 case 17: A = m[n]>>1 ; break;                 // RHS R
 case 18: A -= m[n] ; break;                   // SUB S
 case 19: m[n] = A; A=0 ; break;               // TRC T 
 case 20: m[n] = A ; break;                    // UPD U
 case 21: A += (m[n] * R) ; break;             // ML+ V
 case 22: break;
 case 23: break;                               // NOP X
 case 24: break;                               // RND Y
 case 25: break;                               // END Z
 case 26: break; 
 case 27: break; 
 case 28: break; 
 case 29: break; 
 case 30: break; 
 case 31: break; 
 
 }


pc = _pc;
 instruction = m[pc];

next_t = _t;

}
 // End of mini-EDSAC CPU model
 // ---------------------------------------------------------

As can be seen, we have a fairly conventional Load-Store or Von Neumann architecture where most operations are performed on a memory location m[n] and the Accumulator A. There is a multiplier register R which is used for certain operations – including multiply and ADD and multiply and SUBtract.

In order to simplify the EDSAC model, it was decided to break with the conventional instruction mnemonics and make sure all of the 16 most useful instructions could fit into an instruction based on a 4-bit code.

Because of the way that integer numbers are handled by the MSP430G2553 on the LaunchPad, when defining the conditional jump instructions I had to put additional constraints to define what was positive and negative integers.

In order to program the mini-EDSAC model, it was necessary to load the instruction plus operand/address into an array of 16-bit words. I chose to do this using the SIMPL programming framework that was already running on the MSP430.

The only problem is that SIMPL is reverse Polish (post-fix) based – so needed the operand (number) first, followed by the operation.

With only a few minor changes to SIMPL I was able to load RAM with short programs and single step through the EDSAC code.

The first task was to get the EDSAC decimal integer print routine to work.  I had an example from the “Squares” program of how to print decimal integers – but despite this it still took me a day of debugging to get this to work.

Once I had the means to get numeric output from the mini-EDSAC simulator, I wrote a short program to add 123 to 456 and print out the answer as a decimal integer. This worked perfectly, and in just 272 machine cycles of the mini-EDSAC simulator, I had the correct answer, 579, printed to the terminal.

With the basics in place with the mini-EDSAC simulator, it’s time to port it to an MSP430 with more RAM – and to further explore the programming techniques of these very simple computational machines.

Of course the 16 instruction model could be extended – making use of more of the uppercase characters to act as instruction mnemonics.  Allowing a 5-bit instruction word, means 32 unique instructions – which gives a much greater flexibility of the machine.

Once I have mastered the EDSAC instruction set, it will be time to look at other architectures of Minimal Instruction Set Computers. (MISC).

 

 


							
Posted in Uncategorized | Leave a comment

Computer Science from the Ground Up.

 

Full_Adder_DTL_7

Imagine that you have just taken a 1 term course in Electronic Engineering and Computer Science and you are now facing the end of term exam:

Part 1 – Digital Logic.

1a. Using transistors, diodes and resistors, show how a 2-input NAND gate can be constructed. Show how the 2-inputs can be extended to allow for additional inputs.

1b. Using standard logic symbols show how the following gates may be synthesised from the NAND component.  AND, OR, XOR, NOR

2a. Using conventional logic symbols, show how a half adder my be created from gates devised in 1b. What is the minimum gate count?

2b. Show in a digram, how the half adder can be extended to a full adder, allowing for any additional external gatess that may be required.

3a.  Show the schematic for a 2:1 multiplexer using 2 input NAND gates

3b.  Show how the multiplexer may be extended to 3 input and 4 input. Include the gate count for each of your designs.

4a. A small cpu requires an arithmetical logic unit (ALU) capable of the following operations.

AND, OR, XOR, NOT, ADD, SUBTRACT

Produce a block diagram showing how this may be achieved from conventional logic gates. Pay specific attention to the detail of how each operation is selected.

4b. Explain the term “Bitslice” and using the design for the ALU in 4a. show how the logic may be partitioned into a bitslice design.

5a. A logic design requires a clocked D-type Flip Flop (DFF) – show how this may be created from NAND gates.  Pay particular attention to the method by which the DFF is loaded and clocked.

5b. A small cpu needs a program counter capable of addressing 64K words of memory. Using the DFF from 5a, and any other external logic – show how this may be achieved.

6a. Using the functional blocks described above, plus external memory such as RAM and/or ROM show how these elements may be interconnected connected to form a simple 16-bit cpu.

6b. Explain the operation of the cpu in terms of the sequence of events required to fetch, decode and execute an instruction from memory.

N2Tcover

Now, whilst I studied electronic engineering in the early 1980s, I too would find some of those questions quite a challenge, but as a result of the online course “From NAND to Tetris”, which introduces digital logic and computer systems in a series of heirarchical layers – demystifying each layer in turn. Accompanying the course are a series of course lecture notes, a text book “The Elements of Computer Systems (TECS – available online as pdf) plus hardware and software simulators that will run on any common laptop platform.

Starting from the basic NAND gate the course shows in 6 lectures how to create a complete 16-bit computer from logic “chips” designed using a hardware description language. The computer design, called “Hack” is backed up by a laptop based simulator, and capable of running real code.

With such a structured course, it makes it possible to learn the basics needed to answer the above questions in full in just six weeks of lectures and practical study workshops.

Whilst “NAND to Tetris” is a well structured course and has been designed to allow maximum access and exposure for self-learners to the course materials, the course is based upon an artificially created hardware description language, and a series of test scripts and simulators written in Java. This makes the course viable to the widest audience, who may be of limited resources, but it does not provide hands-on experience of real hardware.

Through my work with myStormmyStorm – the open source FPGA experiment board, I hope to show how “NAND to Tetris” can be used as the basis for a FPGA learning course – using low cost open source FPGA hardware and tools. With the “BlackIce” FPGA board – it will be possible to create a real “Hack” computer – and have the means to connect it to a variety of experiments.

How to introduce Electronics and Computer Science to undergraduate students, makers and hobbyists in a practical “learning by doing” manner, will be the subject of a my forthcoming presentation “Computer Science From the Ground Up”  for OSHCamp, at Wuthering Bytes, on Saturday 2nd September.

 

Posted in Uncategorized | 2 Comments

A simulator for “mini-EDSAC” coded in C

Back in May, I entered into correspondence with Bill Purvis, who is involved with the recreation of EDSAC at The National Museum of Computing, at Bletchley Park.

Bill has created a simplified model of EDSAC, which runs on a 1K Lattice iCE40 IceStick FPGA.

Here are the main differences between his mini-EDSAC and the original:

Comparison between EDSAC and Mini-EDSAC

Mini-EDSAC is a simplified version of the EDSAC instruction set, suitable for implementation on machine with limited capacity. The main differences are listed below:

  • EDSAC uses 35-bit words, Mini-EDSAC uses 16-bit words

  • EDSAC packs two 17-bit orders into a word, Mini-EDSAC has a single 16-bit order in each word.

  • EDSAC uses the 5-bit tape code devised at Cambridge, Mini-EDSAC uses 8-bit ASCII coding, but reduced to 5-bit for the order codes.

  • EDSAC has multiply-and-add and multiply-and-subtract orders which take the content of a multiplier register, multiply it by the operand, and add or subtract it into the accumulator. Mini-EDSAC simply multiplies the accumulator by the operand, placing the result in the accumulator.

  • The EDSAC accumulator is double length 70 bits, the Mini-EDSAC accumulator is 32 bits.

  • The Mini-EDSAC shift orders use the numeric value of the address part to specify how many bits are to be shifted, EDSAC has a more subtle interpretation.

  • EDSAC handles long (35-bit) and short (17-bit) operands, Mini-EDSAC only handles 16-bit operands, hence there is no need for the Long/Short bit in the order format, reducing it to 16-bits.

  • EDSAC input order takes 5-bits from the reader, Mini-EDSAC takes all 8 bits from the ASCII-coded input.

  • EDSAC outputs the top 5 bits of the operand to the printer, Mini-EDSAC outputs the bottom 8 bits of the accumulator.

In general, the appearance is much the same, operation codes match (apart from the encoding). The input programs look much like those of EDSAC programs, using the Initial Orders 1, except the last character of each order is not required.

One noticeable difference is the use of @ in pseudo-orders. EDSAC programs used P as that is encoded as zero. @ maps to zero when the top 2 high bits are removed.

The initial orders are slightly longer – I have not been able to condense them into 31 orders, programs load from address 37 onwards. I have not yet attempted to replicate David Wheeler’s Initial Orders 2!

Bill Purvis, 25th May, 2017

Bill’s mini-EDSAC was the starting point for my simulation – written in Arduino C.  It’s relatively straight forward to get a modern microcontroller to emulate these early machines – by recreating their instruction set in the form of a switch-case statement in C.

This code uses an array of 16-bit locations in RAM  m[n]

A is  the Accumulator

It masks off the instruction field “insn” –  as the top 5 bits

It masks off the address field “n” as the lower 5 bits

 

 //--------------------------------------------------------------------
 // mini-EDSAC CPU Model
 // -------------------------------------------------------------------
 
 static void execute(int instruction) // This is the EDSAC CPU model
 
 // The Alpha code is held in the top 5 bits of the instruction
 // The bottom 11 bits hold the address

 {
 insn = (instruction & 0xF800)/2048 ;
 _pc = pc + 1;
 
 n = (instruction & 0x7FF); // n is the address field - lower 11 bits
 
 switch (insn) 
 {
 
 case 0: A += m[n]           ; break; // ADD A
 case 1: A -= m[n]           ; break; // SUB B SuBtract
 case 2: A += (m[n] & R)     ; break; // COL C - logical AND
 case 3: m[n] = A            ; break; // DEP Deposit D -no clear
 case 4: if(A>=0) {_pc = n; }  break; // JGT E
 case 5:                       break; // VER F
 case 6: if(A<0) {_pc = n; } ; break; // JLT G
 case 7: R += m[n]           ; break; // CPY H - Load R register
 case 8: m[n]=uart_getc();     break; // INP I - get char from UART
 case 9: A = A>>n            ; break; // RHS J - Right Shift
 case 10: m[n] = A; A=0      ; break; // TRC K - Transfer and clear
 case 11: A = A<<n           ; break; // LSH L - Left Shift
 case 12: A += (m[n] * R)    ; break; // ML+ M - Multiply and ADD
 case 13: A -= m[n] * R      ; break; // ML- N - Multiply and SuBtract
 case 14: uart_putc(m[n])    ; break; // OUT O - Output character to UART
 case 15:                      break; // PUT P - PUT constant in memory

// These cases >= 16 are not used on 4-bit instruction model

 case 16:                      break; //     Q - Not Used
 case 17: A = m[n]>>1        ; break; // RHS R - Right Shift
 case 18: A -= m[n]          ; break; // SUB S - Subtract
 case 19: m[n] = A; A=0      ; break; // TRC T - Transfer and Clear Acc
 case 20: m[n] = A           ; break; // UPD U - Transfer - don't Clear
 case 21: A += (m[n] * R)    ; break; // ML+ V - Multiply and ADD
 case 22:                      break; //     W - Not Used
 case 23:                      break; // NOP X - No Operation
 case 24:                      break; // RND Y - Round Accumulator
 case 25:                      break; // END Z - Stop and ring Bell!
 case 26:                      break; 
 case 27:                      break; 
 case 28:                      break; 
 case 29:                      break; 
 case 30:                      break; 
 case 31:                      break; 
 
 }

// Note - the most useful instructions crammed into top 16 cases
// to allow for 4-bit instruction selection
// So those below "P" are moved up to fill vacant slots 
// At "B", "D", "K", "M"
// Model can be extended in future for 5-bit instruction codes

pc = _pc;
 instruction = m[pc];

next_t = _t;

}
 
 
 
 // End of CPU model
 // ------------------------------------------------------------------

Posted in Uncategorized | Leave a comment

SIMPL on EDSAC

SIMPL on EDSAC

When you start coding on a new processor architecture, you always have the initial problem of how to get some sort of a bootloader into memory, to allow you to then load your application program into memory.

Originally this was done manually using toggle switches, to set up the data word for a given address and then write that bit pattern directly into memory, incrementing the address each time. Naturally this bootloader had to be as short as possible – as toggling long sequences of raw data into memory is not only tedious but prone to errors.

The engineers who devised those early bootloaders were masters of minimalism – with EDSAC requiring only 31 instructions – a program called “Initial Orders” which was fed into memory using an arrangement of hard-wired uniselectors. A uniselector was a piece of telephone switch gear – a relay operated “stepping switch” which could be used to generate a fixed output pattern for a given input – effectively a ROM.

With “Initial Orders” loaded, EDSAC could then be loaded from a teletype paper tape reader – using 5-bit teletype code to represent instructions, numbers and characters.

Having pre-packaged routines to assist with the coding of a new processor  is always useful – and being able to take this “toolkit” of known, working routines from processor to processor – makes life so much simpler.  This was what Charles Moore (the inventor of Forth) effectively did – in the 1960s, when he was coding on many different mainframe architectures. It was through the need to have a portable language – regardless of the architecture of the machine, that Moore’s toolkit eventually evolved into Forth –  How this was achieved is described in his article – “Forth – The Early Years”

Inspired by Chuck Moore, I wanted to develop my own language framework, but I needed something to start me off in the right direction.  That incentive came at a talk I attended in May 2013 – where I learned of a very simple interpreter called TXTZYME – by Ward Cunningham.

Inspired by txtzype, I began to  develop a programming framework called SIMPL. (serial interpreted minimal programming language) – which I have ported to a variety of microcontrollers including MSP430, ARM and AVR.

SIMPL is best described as a character interpreter running in a while loop, with routines to handle numerical input and numerical output.

SIMPL interprets a string of characters, and performs actions based on the characters. It is one level above machine language, providing a framework and environment that is independent of the specific details of the cpu. In theory, once you have ported SIMPL – you have a common machine interface – and no need to deal with the quirky machine code anymore!

Porting SIMPL to EDSAC

Porting SIMPL to a new cpu architecture is an ideal way of gaining an understanding of that architecture and it’s machine code.  As EDSAC has very few useful instructions, I viewed the porting of SIMPL to EDSAC as something of a coding challenge.

Fortunately it has ADD and Subtract, and these can be used with the conditional jump instructions for branching and looping. There is also AND, which can be combined with a subtraction from FFFF to negate the bits to create NAND and from NAND we can generate OR, NOR and XOR.

The ability to string primitive instructions together as short sequences to make compound ones is something that can be readily automated by the SIMPL interpreter.

As I stated in the earlier post – it seems unusual to have a hardware multiplier on such an early machine – but Maurice Wilkes predicted that EDSAC would be used for real mathematical computation – so the multiplier was purposefully included right form the start.

The other problem is that the chosen target board – an MSP430 with just 512bytes of RAM has to host the version of SIMPL running on the virtual EDSAC machine – which means cramming a meaningful version of SIMPL into about 200 instructions – which are 16 bit words.

The point of this exercise is to make programming the EDSAC model a lot easier, than raw machine language, and also to get a feel for how a “language” can be created from an instruction set that lacks many of the basic addressing modes.

Bite the Bullet – Itch that Scratch – and so in two days of spare time whist on vacation, an EDSAC simultor was coded on Day 1 – and large chunks of SIMPL were coded on Day 2.

A Model of SIMPL.

SIMPL deals with character input and character output. This is handled with the get_char and put_char routines which communicate with a serial terminal program such as TeraTerm.

Every printable ASCII character is treated as a command. This needs a jump table of 96 entries to be able to jump to any code body address, having received a character from the serial input routine.

Numbers are treated slightly differently by a routine called “number” which reads each character in turn and assembles a 16-bit integer from the given input string.

Similarly there is an output routine “printnum” which takes a 16-bit number off the stack and generates a printable string from it.

In addition to these three routines, there are the action  routines associated with each command.  These consist of a basic set of primitive instructions – mostly represented by punctuation characters and symbols. Higher level functions are represented by lower case alpha characters, and upper case alpha characters are used for User Definable words. In this respect – SIMPL is a bit like Forth, but lacks the dictionary structure.

EDSAC has a pair of Jump instructions – but it lacks a subroutine Call and Return – as these devices were not invented until 1952 – 3 years after EDSAC first ran.  So EDSAC will have to rely on a Jump to the code body for a given character, and then a Jump back to the code that fetches the NEXT character from RAM.

SIMPL needs about 1Kbytes of program memory on the MSP430 – it’s about 250 instructions for the basic kernel.

The EDSAC instruction set is indeed very basic when compared to MSP430 – and so new methods will need to be devised to get the program structure to work. EDSAC code is entirely stored in RAM – so there is the means to self modify code (scary) – but think of this as a means to an end to create the necessary jump table structure needed by SIMPL.

EDSAC only has about 16 or 18 instructions – and they were represented by capital letters – making the assembly language a lot easier to read. I chose the top 16 instructions and wrote an EDSAC simulator in C to allow me to experience this lower level of computing.

SIMPL provided the communications framework, and allowed me to dedicate a few commands to controlling the  virtual machine, such as resetting the program counter and dumping the first 32 locations of RAM in an easily disassembled manner.

EDSAC predates ascii by about 14 years – so all of the communication and I/O was via a wartime surplus teletype that used 5-bit codes. For the modern simulation this may all be updated.  The instruction set – nominally some of the capital letters A through to Z may be simplified to 4 bit numerical opcodes 0-F.

So I derived an EDSAC virtual CPU model in a few lines of C code. This runs under Energia 17 – which is the MSP430 version of the  Arduino IDE

The EDSAC cpu model allows me to hack about with the instruction set,  and work out how exactly to port an interactive language to a resource limited cpu by creating a virtual machine.

Character and Number Input and Output

Character input and output is handled by the EDSAC’s I and O instructions which load a character from tape into a specific memory address and print a given character on the teletype.  For convenience these will interact with uart routines equivalent to get_char and put_char and allow communication from a terminal application.

print_num

The EDSAC literature also includes a listing of the code for printing out the squares of numbers from 0 to 100. This is useful because it features an integer print routine – which is also an essential routine when implementing SIMPL.

number

The “initial orders” program has a routine that tests each character from a sequence of consecutive characters to see if it is a number, and by a process of successive multiplication by 10, expands the input characters as  a decimal number in memory. This is equivalent to the “number” routine in SIMPL.

Jump Table

Fundamental to SIMPL is the jump table – identify a character in the input buffer and execute  a block of code associated with that character. Get this mechanism right – and SIMPL will be a breeze on EDSAC.

One trick with the EDSAC instructions is that both the instruction field and the address field can be manipulated via the accumulator and then deposited back in memory. This allows, for example, a jump table to be created  – based on the value of an input character.  This powerful technique is just what is needed to create an interactive language such as SIMPL – which really is one big jump table.

So we have ascii characters beginning at 32 – we need to create a jump table – so lets load the character into the bottom of the accumulator as an address of where to find the “jump”.  This has to be done in two steps – as we first have to create the pseudo instruction in memory.

T0 - clear the ACC
I2 - get  the character into m[2]
A2 - ADD m[2] into ACC
A3 - ADD m[3] which has been preloaded with the instruction for Jump
U4 - store the assembled instruction at m[4]
E4 - Jump to m[4] - which is now a jump to the jump table

 

Every character now has somewhere to jump to – including numbers, lower case, upper case and punctuation symbols. Numbers can be handled by a routine which looks at each character and creates an integer in the accumulator, which can then be stored on the top of the stack.

With the jump table – we have a mechanism for addressing the code that contains the primitives.

A register model

EDSAC does not use registers in the usual sense. Whilst there is a register to hold the multiplier  – for both the “multiply and add into accumulator” and the “multiply and subtract from accumulator”.  It is also used to hold one of the operands of the AND (collate) instruction.  The multiplier register can only be loaded from a memory and not directly through the accumulator.

However – selecting a small group of memory locations near the start of RAM and giving them nominal register names – will be a good way of creating some order – and a framework for the SIMPL implementation.  Instruction Pointer, stack pointer, return stack and program counter can be implemented thus.

 

Posted in Uncategorized | Leave a comment

Opening up EDSAC

EDSAC was the first of the Cambridge University computers, brought up in 1949 and intended for real scientific and mathematical work.

This year, there is considerable revival of interest in EDSAC – as the National Museum of Computing at Bletchley Park – have recreated a faithful reproduction of the original machine.  Also – the Open Source Hardware Users Group (OSHUG) have set up the EDSAC Challenge in September – which is to capture the flavour of EDSAC but using modern, open source FPGA hardware.

Why all this fuss about EDSAC – a 68 year old machine?

EDSAC was the first practical computer designed to do real work. It also exemplifies what a bit of ingenuity can do to an otherwise technically simple machine. It shows what can be done with the simplest of load-store architectures- and also teaches some of the tricks in making a hardware limited cpu do what you need it to do.

The EDSAC instruction set was based on the 5-bit teletype codes read in from a paper tape reader, that were common at that time as they were used for military and civilian communications.

Quite unusually EDSAC was given a multiplier instruction – somewhat simplified because of the serial nature of the memory and the arithmetic. EDSAC was based on the classic Von Neuman “Load-Store” architecture and as numerical precision was considered of high importance it used a 35 bit word length and the accumulator was 71 bits to accommodate the result of a 35×35 bit multiplication.

Instructions were designated capital letters – an early form of mnemonic, which made the program listings easier to read.  This feature is peculiar to EDSAC – and something that must have been lost when computers needed more than 26 instructions.

EDSAC used 18 instructions presumably tailored to the capabilities of the ALU and delay line memory.

What was it like to code on such a machine with such a limited instruction set?  I wanted to know – so I wrote an EDSAC virtual machine that will run on Arduino or MSP430 LaunchPad. From this I can then experiment with the architecture, of EDSAC and other unsophisticated processors.  This leads on nicely to my interest in Minimal Instruction Set Computers (MISC) – which by their nature are simple to model in code and use very little hardware.

So I created a simplified model – in C, which will run on an Arduino or MSP430 Launchpad. For practical coding reasons, 16 instructions will be used, which can be represented with a 4 bit instruction code.

Whilst EDSAC was created to help perform mathematical calculations – about 150 times faster than a 1940’s desktop mechanical calculator and so precision multiplication was deemed to be important,  the aim of the simplified model is to explore the limitations of a reduced instruction set, and especially when you are trying to create a virtual machine and an interactive programming environment with limited instructions.

From an instruction synthesis point of view, we have ADD, AND and SUBTRACT, so NEGATE, NAND, OR, and XOR should all be possible.

There are two jump instructions  Jump if A>=0 and Jump if A<0.  These  should be sufficient to meet most test situations.

A ADD
C COLLATE - AND
E JUMP IF A>=0
F VERIFY
G JUMP if A<0
H LOAD MULTIPLIER REGISTER
I LOAD INPUT CHARACTER
L LEFT SHIFT
N MULTIPLY AND SUBTRACT
O OUTPUT CHARACTER
P LOAD DATA
R RIGHT SHIFT
S SUBTRACT
T TRANSFER AND CLEAR
U TRANSFER NO CLEAR
V MULTIPLY AND ADD
X NO OP
Y ROUND ACCUMULATOR 
Z STOP AND RING BELL

The instruction set was simple and obviously dictated by the serial arithmetic unit. Multiplying iss a fairly straightforward operation with serial data – even on such a limited machine – the ability to multiply and add, or multiply and subtract was given high status.

EDSAC used about 3000 vacuum tubes for it’s logic – and memory consisted of mercury delay lines – where data existed as a series of standing waves that hd to be continuously refreshed.

This poster gives a really good overall view of the EDSAC machine – and has example code for “Initial Orders” which was a very early form of bootloader – in just 31 instructions. These instructions were “dialled” into memory using a system of hardwired uniselectors – as a lot of EDSAC was based on then current (General Post Office GPO) telecommunication hardware.

With the initial orders program running the main program could be loaded into memory from paper tape.

For the over curious – there is an EDSAC simulator here. Coding on such a primitive load-store architecture is certainly not a task for the fainthearted and takes some getting used to.

It’s easy to get hung up on historical details of this machine, so I created a cpu model, that behaves like EDSAC, but not necessarily constrained to the same teletype character codes or other hardware restrictions.

Perhaps it would be easier to impose a Register model on the cpu – so for example the first 16 locations in memory are given R0 to R15 names – that will certainly make it mentally easier to remember where variables have been stored.

On top of this register model – we can then lay out the virtual machine model, with data and return stacks, instruction pointer, top and 2nd, etc.

The code for the MSP430 is here on Github Gist . It consists of a simple instruction decoder and memory model. Instructions may be written in any text editor and then sent to the simulator using a terminal program – such as Teraterm.

 

 

 

//---------------------------------------------------------------------
// EDSAC CPU Model
// --------------------------------------------------------------------
// Instructions between Q and Z are replicated lower in the table
// This allows all instructions to be accessed with a 4 bit opcode
 
 static void execute(int instruction) // This is the EDSAC CPU model
 
 // The Alpha code is held in the top 5 bits of the instruction
 // The bottom 11 bits hold the address
 {
 insn = (instruction & 0xF800)/2048 ;
 _pc = pc + 1;
 
 n = (instruction & 0x7FF); // n is the address field - lower 11 bits
 
 switch (insn) 
 
 
 {
 
 case 0: A += m[n] ; break; // ADD A
 case 1: A -= m[n] ; break; // SUB Subtract
 case 2: A += (m[n] & R) ; break; // COL C
 case 3: m[n] = A ; break; // DEP Deposit D -no clear
 case 4: if(A>=0) {_pc = n; } break; // JGT E
 case 5: break; // VER F
 case 6: if(A<0) {_pc = n; } ; break; // JLT E
 case 7: R += m[n] ; break; // CPY H - Load R register
 case 8: break; // INP I 
 case 9: A = m[n]>>1 ; break; // Right Shift
 case 10: m[n] = A; A=0 ; break; // Transfer and clear
 case 11: A = m[n]<<1 ; break; // LSH L
 case 12: A += (m[n] * R) ; break; // Multiply and ADD
 case 13: A -= m[n] * R ; break; // Multiply and Subtract- N
 case 14: break; // OUT O
 case 15: break; // PUT P
 case 16: break;
 case 17: A = m[n]>>1 ; break; // RHS R
 case 18: A -= m[n] ; break; // SUB S
 case 19: m[n] = A; A=0 ; break; // TRC T 
 case 20: m[n] = A ; break; // UPD U
 case 21: A += (m[n] * R) ; break; // ML+ V
 case 22: break;
 case 23: break; // NOP X
 case 24: break; // RND Y
 case 25: break; // END Z
 case 26: break; 
 case 27: break; 
 case 28: break; 
 case 29: break; 
 case 30: break; 
 case 31: break; 
 
 }


pc = _pc;
 instruction = m[pc];

next_t = _t;

}
 
 
 
 // End of CPU model
 // ------------------------------------------------------------------

 

 

 

Posted in Uncategorized | Leave a comment

SIMPL Gets Simpler

This week I have been working on my tiny, Forth-like language SIMPL.  in a few hours of spare time I have managed to optimise some of the routines and get the code size down from around 900 bytes to just 780.

The main way that I achieved this was by reducing the size of the jump table from 196 bytes to 120 bytes, and using less code intensive ways to achieve the same functionality.

By paring away at the kernel in this way, makes for more room to add additional functionality  – and still stay within the target of just 1024 bytes.

SIMPL provides an extensible language framework allowing:
text entry – either from terminal or by file transfer
text output
32 basic instructions (primitives)
26 lower level functions – I/O related
arithmetical and logical operations
stack manipulation
I/O manipulation
Millisecond and microsecond timing operations
Looping
memory examination and modification
The kernel of SIMPL is extensible  but in it minimum operational state fits into roughly 800 bytes and has the following main routines
set_up         – sets up the clock, the UART and the GPIO
textRead    – Read a waiting character from the uart and store it into a buffer area in RAM. The location of where it is stored is determined whether we are in immediate mode or compile mode.   Immediate mode saves the character string into a text buffer at 0x0200.  Compile mode saves into a buffer starting at 0x0220 for “A” 0x0240 for “B” and so on. textRead finishes when it encounters a newline or carriage return character. It puts ascii 0x80 on the end of the buffer to signify the end of the input and sets the instruction pointer to point at the start of the text buffer (0x0200). It then jumps to “next”
next –  this is the key routine – central to the  whole operation of SIMPL.  It implements a very basic virtual machine that can work its way through an array  of characters in memory,  making decisions on their type and executing action routines based on the numeric ascii value of the character.   It gets the next character from the buffer and test to see whether it is a numeric character, an uppercase character or something else.
number – determine whether the character is part of a string of numerical characters and form into a 16-bit integer to be placed on the stack. Jumps back to “next” on completion
alpha – determine if a character is an upper case letter, calculate the code address associated with it and modify the instruction pointer to point to that address. Jumps back to “next”
jumper –  This looks up a jump address from a jump table based on the ascii value of the character. 60 table entries are stored – allowing for all the punctuation, arithmetic and logical symbols and the lower case letters.  Each of the 60 characters has an action routine associated with it which is executed when the character is detected in RAM.  All of the action routines jump back to “next” on completion
printnum   – take a 16-bit integer and convert it into an ascii text string for display
The above routines of the kernel can be coded into about 782 bytes on the MSP430 – allowing the additional routines to fit into about 240 bytes.   These will allow access to the peripherals including ADC, GPIO, Timers, SPI etc.

 

 

Posted in Uncategorized | Leave a comment