As I read more about Forth, and engage with some of the World’s Forth-Thinkers, I get the impression that at some point Forth becomes addictive and productive. I have yet to reach that level of enlightenment, but my journey continues slowly, in fits and starts, as there is always some other more pressing engagement on my metal capacity.
I have been for some time a proponent of minimal instruction set (MISC) machines – and Chuck Moore has been prolific in this area for the last 35 years or so. In a progressive career in creating silicon integrated circuits that execute Forth primitives directly, he has slowly whittled away the complexity of these machines, to the point where 144 individual cores can be put on a single die – as part of his Green Arrays work.
The problem then becomes not so much a hardware challenge – but then one of software just how do you efficiently co-ordinate the tasks of 144 cores. Perhaps we have something to learn from the behaviour of bees or other colonial insects including ants and termites?
Each worker has a specific task to perform be it foraging for food, nursery duty or general housekeeping. Each agent is running a low level behavioural model, with a higher level specialisation – based on the specific task or in response to certain stmuli.
This might be one way to organise a multicore processor – a set of low level functions which are augmented by task or sensor specific application functions.
For the moment however, I first have to define the Instruction Set and Architecture (ISA) of my proposed Stack CPU.
Charles Moore, Chen Hanson Ting and Bill Meunch provided an important framework for MISC machines, and proved them to be an important family of cpus. The small instruction set means simplified hardware, fewer transistors or logic cells and the ability to implement using low cost FPGA technology.
With a minimal instruction set, and a Forthlike language, it’s always possible to create more complex instructions by concatenating the primitive instructions. Provided that you have a small number of useful instructions – any program can be synthesised.
Moore and Ting postulated that the minimum useful instruction set would be about 25 to 30 instructions, and this fits in well with using 5 bit tokens to represent instructions. Moore packed four, 5 bit instructions into 21 bit wide memory – and this formed a very simple pipeline architecture, which could also be used to hold a 16 bit literal and a 5 bit instruction. This scheme was used in Moore’s ShBoom processor and also the Mup21.
The Novix NC4016 however used a different approach and used the bit fields of the instruction to directly control different parts of the hardware.
This was the same approach taken by James Bowman – with his J1 Forth CPU. This is fully described in just 200 lines of verilog and been implemented on Xilinx and Lattice FPGA devices.
James’s device was influenced by Charles Moore’s Novix NC4016 architecture, and has been successfully incorporated into a retro gaming shield for Arduino – Gameduino. James is a professional Silicon Architect as his day-job, and the J1 is also used as a core on some graphic processor chips.
Using a microcode ROM it would be possible to decode 5 bit instruction tokens into multiple field wider instructions – and this would allow the simple pipelining scheme to be utilised.
The Virtual Instruction Set
Here’s a non-exhaustive list of instructions that would be useful candidates for a Stack oriented MISC computer. Most of these could execute in a single cycle on a FPGA based processor as run-time code, but some of the program flow constructs would need to be compiled.
In total about 35 instructions, but some savings could be made bringing it down to 32, which allows 5 bit tokens and packing of instructions into wider memory words
This maps well to the 33 printable ascii symbols (not alpha or numbers). These are chosen for their strong mnemonic bias – to form the basis of the SIMPL language. With only 32 symbols to learn its like a shorthand version of Forth yet retaining its human readability. With a compiler, there is no reason why the longhand version of the instruction cannot be included in the listing. It also reduces the source code from an average of 4 characters (including space) per instruction to just 1 character per instruction. This compaction of source code means that snippets of executable code can be transferred between processors in small packets – using wireless connectivity or even as SMS messages over the GSM network.
In the next post I’ll cover some aspects of mapping this minimal instruction set onto the J1 architecture, plus a look at some of the cross-compiler and simulator tools needed to allow further exploration of the MISC class of computing machines.