The simple load-store architecture of the “Hack” computer featured in the NAND to Tetris course is worth of revisiting.
Some years ago I created a bitslice design using 2 input NAND gates to handle just one bit of the design. 16 such slices would be needed in order to create the whole computer.
Bitslicing was widely used from the mid-1960’s to the early 1980’s and several IC manufacturers made slices in various bit-widths. The most common were 4-bit – from Texas Instruments and AMD.
The bitslice takes advantage of the fact that the ALU operations generally operate only on bits taken from the same position in the word – and apart from a Carry-In and a Carry-Out the slices can be virtually independent.
As well as the ALU, it is also common to create the various storage registers a bit at a time – which was done in the PDP-8 in the mid-1960s. Additionally the program counter can be created as a structure on the slice.
In the work I did when I first visited this I found that the breakdown between the various slice features was as follows:
Input Negator (x2) 8 gates
Input force to zero (x2) 6 gates
Full Adder (2 x XOR) 9 gates
AND function 1 gate
Output Negator XOR 4 gates
ADD/AND Function Mux 4 gates
Zero Detect 3 gates
Spare gate 1 gate
Program Counter 16 gates
D register 9 gates
A register 9 gates
ALU Mux (2:1) 4 gates
A reg Mux (2:1) 4 gates
Spare 2 gates
Total 80 gates - or 20 quad 2-input NAND packages
It was interesting to note that the 4 gate XOR function appeared repeatedly – 6 times, and the 4 gate, 2 input multiplexer was also used 6 times – between these common building blocks accounting for 60% of the overall logic design. In a future revision using 74LV86 quad 2 input XOR and 2:1 multiplexers would make for a more efficient design – possibly if the slice was increased to 4 bits wide. The registers are also package costly – needing one package for the flip-flop, one for the load multiplexer and a further single inverter to produce complementary data inputs for the flip flop, there could be some savings made here also.
However the thrust of this project was to investigate the feasibility of just using 2 input NANDs and to become familiar with the structures in the cpu core.
With a total of just 20 quad NAND packages on each slice, costing about $0.10 each, to realise the 16-bit cpu in hardware would require approximately 320 packages – about $32.00. This was starting to look like a viable proposition, and not such a daft idea as my original aim of implementing it in DTL (diode transistor logic).
Using small outline surface mount 74LV00 packages – the 20 devices could easily fit on a 2 layer board about 2″ x 2″ – and plug into a backplane using low cost double row 0.1″ connectors.
Each board would use jumper links to ensure that it had the correct carry and zero-detect signals taken from the backplane, and also ensure that it received the correct input bit and output bit – selecting from the 16 available.
For convenience of programming, an Arduino Due (or MEGA) would be used to simulate the ROM and RAM memory functions, and be able to provide a programming and user interface via a serial terminal.
The status of the registers would be displayed using various colours of LEDS – allowing the contents of the A (address) register, the D (Data) register and the Program Counter to be monitored visually.
It would also be intuitive to bring out the memory register and the Instruction register – to aid debugging. These two additional registers would cost an additional 18 gates per slice – roughly another 5 devices. There was some thought of adding toggle switches for manual input of data – and this could be revisited at a later date.
I decided that it would be neat to make the pcbs the same overall dimensions as the Arduino – so that they could all fit into a 3D printed card frame with common backplane. The overall dimensions of this frame would be around 10″ wide, 5″ deep and 2.5″ high. This would make it a manageable desktop product – and not too tiny. Cards would be spaced approximately 0.5″ apart.
EagleCAD was used for the original CAD, and recently parts of the design have been imported into LogiSim to allow the logic to be carefully checked. There are a maximum of 18 gate delays in the ALU – so conservatively taking the propagation delay per gate to be 10nS this might suggest 180nS total delay in the ALU or about 5MHz operation. However, as an Arduino will be used as the “memory ” emulator, it is unlikely that this speed will be achieved.