A Minimum Interactive Language Toolkit

In previous posts I began to describe a minimum interactive language toolkit which could work standalone in virtually any small microcontroller. Here are the following prerequisites I stated in the previous post:

  1.  Interactive – commands may be typed at the keyboard for immediate execution
  2.  Commands can be combined into programs then compiled for later execution
  3.  Extensible – from a small kernel of commands, a whole application can be built
  4.  No external tools required other than a serial terminal – all tools run on chip.

This may appear to be a huge break from conventional wisdom, as virtually all embedded microcontroller development is done in high level C – compiled using a package of tools hosted on a more powerful machine – such as laptop.

Microcontrollers have evolved to have a fairly large proportion of their memory as flash-ROM, and modest quantities of  RAM – so it’s not unusual to find a micro with 32K bytes of flash but only 2k bytes of RAM. Whilst this partition of memory resources is not ideal for interactive languages (more RAM would be nice) – it’s enough to get started.

In the early days of microprocessors – it was commonplace to have a “Monitor” program – which consisted of a few simple commands to allow hexadecimal opcodes to be entered directly into memory and then executed from a given address. Some of these monitor programs also allowed a hex-dump to be sent to a terminal, to examine the contents of memory, plus primitive editing commands.

Typically the monitor would take a few hundred bytes of ROM, but it provided the absolute basics of being able to write machine code. Assemblers seldom existed, given the meager resources of the early home computers, and so a lot of coding was done by hand assembly – using pencil and paper – or by copying hexadecimal listings out of magazines.

What I am proposing here is a program that offers the same low level support as a monitor, but rather than programming in raw Hex or machine language, the User has access to a small instruction set of highly mnemonic commands.  These are executed out of memory by a  generic virtual machine, hosted on the microcontroller in about 600 bytes:

The mechanics of this virtual machine are as follows:

  1. Take in a string of serial characters and place in a buffer until the newline character is detected.
  2. Parse through the characters one at a time creating a jump address into a look-up table based on the ascii value of the character.
  3. Numerical character strings are handled by the number routine, forming them into a 16-bit integer which is placed on the stack.
  4. All other characters cause program flow to jump to a table-selected address from where code is executed. Numerical parameters may be used from the stack.
  5. Where appropriate putchar is used to provide serial output of numbers and strings
  6. Jump back to the parser in (1).

The prototype has been coded up in MSP430 assembly language – and is available at the following github repository.

Posted in Uncategorized | Leave a comment

Minimal Computing – some more thoughts

This post is concerned with minimal computing – a subject that is close to my heart. I write this on the day that my Employers have suffered a major ransom-ware cyber attack on their servers and IT systems, bringing the companies activities to a virtual standstill.  To me this proves the fragility of the technology that we entrust for our day to day lives. There has to be a better, much simpler way……..

In the 1960s, the world of computing was evolving quickly as many new machines came on the market.  For code developers at that time, this presented a problem in that every new machine had a different assembly language, and this, and other quirks of the machine had to be mastered before efficient coding could be done.

High level languages such as Fortran, Algol and Cobol first appeared in the late 1950s, and whilst these eliminated the need for the developer to handle the machine language directly, the languages brought their own problems and abstracted control of the machine from the programmer.

The young developers of that age wanted something better, and both Ken Thompson of Bell Labs and C fame, and Charles H. Moore – inventor of Forth,  came up with their own solutions – to create a comfortable programming environment where they could code efficiently.

Thompson was part of the team including Brian Kernighan and Dennis Richie, working a Bell Labs who brought us the C programming language and the UNIX operating system – which is the basis of Linux and the foundation layer of almost all open source software.

Moore decided that he preferred a language that was closer to the “metal”,  and interactive in nature – so as to avoid the edit, compile, debug cycle – typical of a compiled language such as C.  Moore also believed that the target processor should be able to host it’s own toolchain – and not rely on external resources from a more powerful machine. Key to this was the dual capabilities of Forth – summarised (from Wikipedia) “Forth features both interactive execution of commands  and the ability to compile sequences of commands for later execution.” It is also extensible, in that it has the means for the programmer to create new commands.

Inspired by Chuck Moore’s Forth, and the minimum instruction set computing (MISC) ICs that he subsequently developed, I decided to get to the heart of minimum interactive computing and devise a computing environment that could be applied to the smallest, resource limited microcontrollers and offer the four main features of Forth that I felt were most important

  1.  Interactive – commands may be typed at the keyboard for immediate execution
  2.  Commands can be combined into programs then compiled for later execution
  3.  Extensible – from a small kernel of commands a whole application can be built
  4.  No external tools required other than a serial terminal

These I believe are the minimum requirements for computing – an interactive computing environment that can accessed with nothing more than a serial terminal.

Now some versions of Forth can be very comprehensive and run to about 16Kbytes, though most are between 4K and 6K.  My quest was for a much-reduced “Forth-Like” environment, that was so small, that it could almost become part of the bootloader, and be always present at start-up.  My initial experimentation was with Arduino – as it was so widely available,  but I then progressed to ARM and MSP430 – because of their larger wordsizes.  I settled on the MSP430 as my “model processor”  – because of it’s 16-bit wordsize (ideal for Forth) and the fact that it had a very “clean, orthogonal” instruction set, with a rich set of registers. I found coding in MSP430 assembly language relatively straightforward and thus I use it to test out new ideas.

As a bare minimum, the interactive environment needs to be able to do the following:

  1. Read consecutive serial characters from a UART and place them into an input buffer.
  2. Identify numerical strings from this buffer and convert them into integers to store in RAM.
  3. Use non-numerical characters as the index for a jump table, allowing the processor to branch to blocks of code on the basis of the ascii value of the character.
  4. Execute code at the jump address, then return to fetch the next character from the input buffer.

These actions are best performed by an inner interpreter  running within a loop, which co-ordinates the actions and ensures that the characters from the buffer are interpreted in sequence until the buffer end is reached with a newline character. This is all that is required to form the basis of an interactive character interpreter framework – and in MSP430 assembly language may be achieved in about 300bytes – including the initialisation of peripherals (UART, GPIO etc) and the get_char and put_char UART routines.

The other main aspect of a Forth-like language is that ability to store sequences of commands from the input buffer to memory,  and “compile” them, so that they are available to be run later.

Forth uses the process of giving these sequences a name or “word” and uses a process called the “colon definition” to commit the sequences to memory in an orderly arrangement – such that they may be retrieved and executed later.  Forth uses a dictionary structure to do this which it scans to find the word just typed.  Whilst convenient, this offered more sophistication than I needed so I adopted a much more basic approach.

In order to keep this process extremely simple, I decided that  naming a sequence would just really mean allocating a known character to it, such that it can be executed upon receipt of that character.  Then finding a fixed address to store the sequence could also be based on the value of that character, so that it may be located by a jump table.

So with this simple approach, for example  we can type a sequence  100L   The number decode routine will put 100 (decimal) onto the stack and then jump to the code that is addressed by decoding ascii L. This might be for example a routine that sends a number to a parallel 8-bit port which has LEDs attached.  The routine picks up the value of 100 from the stack and lights the LEDs to give a binary representation of 100.

This was how the programming toolkit started – just the means to decode a single integer number and use it within a given routine to perform some I/O action.

It was decided that capital letters would be used for the user routines, allowing a full 26 different actions, initially passing a single numerical parameter to the routine via the stack.  This was deemed a little limited, so a mechanism was derived to put more parameters onto the stack – so that arithmetical operations could be done.

Forth uses whitespace to separate words, but I thought that the space character could be used to separate sequences of numbers and place them on the stack one after another:

14 29    put 14 on the stack then put 29 on the stack

We can then introduce code that provides the basic maths operators + – * /

14 29+     Adds 14 and 29 leaving 43 on the top of the stack

14 29-     Subtracts 14 from 29 leaving 15 on the top of the stack

Following on from the above example we can then add in the “L”   LEDS command

14 29+L      Illuminates the LEDs with binary pattern for 43

14 29-L       Illuminates the LEDs with binary pattern for 15

So the fledgling language began – a means to perform a series of operations expressed as a string of serial characters. As the language evolved, the following conventions emerged:

Language primitives:   All ascii symbols and punctuation marks ! ” % ^ & *  ( ) _ +  etc

Built in functions:          Lower case characters a to z

User “Words”:                  Upper Case characters A to Z

Assembly Language Implememtation

Tiny-Forths had been explored before – notably on the AVR, a register rich 8 -bit processor but these resulted in about a 2K bytes implementation. My aim was to reduce the core of the language to less than 1024 bytes.

In the winter and spring of 2017, I decided to code up the language in MSP430 assembly language.  This served two purposes, I got to learn a little MSP430 assembly language, and it illustrated what sort of resources were required of a processor in order to implement a compact version of this language.

The MSP430 was chosen because it was a 16 bit processor – and it had a very orthogonal instruction set and a rich set of 16 registers. To me, it represented a blank canvas on which to paint the essence of the language.

The MSP430 code was quite compact with the kernel of the language fitting into some 300 bytes and a full implementation in under 900 bytes.

In the next part I will take the ideas around a self contained 1K byte interactive language toolkit further.

 

 

 

 

 

 

 

 

 

 

 

Posted in Uncategorized | Leave a comment

Minimal CPUs – “One Page Computing”

As I continue to explore the boundaries between computing hardware, firmware and software as part of an effort to reduce the perceived complexity to something that I can understand, the idea arose for a computing platform with an artificially imposed simplicity.

Hackaday has run competitions for the best code application written in fewer that 1024 bytes of code, so this led me to think about what sort of computing machine could be constructed in fewer than 1024, 2-input NAND gates. Memory would not be counted.

It was whilst pondering this when I received a comment to a past blog post from “BigEd” – who is a participant and moderator in the anyCPU forum

As a result of his comment,  I visited the anyCPU forum, and this showed that hardware constrained cpus were nothing new.

anyCPU is a spin-off from 6502.org to address the interests of those actively developing new cpu designs using FPGAs.  Several of these cpus were designed  with influences from the 6502, using the 6502 as a benchmark for the performance of the new designs. This has led to some interesting challenges within the forum – such as to create an 8-bit design with similar or better performance to the 6502, and the “One Page Computing” challenge  – which deserves special mention.

“One Page Computing”

The “OPC” project is to create a fully functioning, useful cpu, who’s design can be captured in just 1 page of verilog code – where a page is defined as 66 lines of 132 characters – or the old green and white, fan-fold, tractor feed line-printer paper.

This artificial design constraint,  puts some interesting challenges on the design – and in remarkable short time – has led to an interesting evolution of cpu designs – that meet this criteria. With each generation of the design, the instruction set has improved, as has the performance.

The OPC challenge was conceived in mid-April and in barely 4 months the OPC design has gone through 6 major iterations as outlined in this github project   This shows that FPGAs can be used to quickly prototype new cpu designs with minimum of expenditure.

The clever folks on the OPC Project now have written an assembler, and there is a C compiler currently being adapted to cater for the OPC6 instruction set.  Another enthusiast is porting PLASMA, a high level bytecoded language to the OPC6.

OPC6 was design constrained by the need for it’s verilog description to fit onto 1 page of printer paper – but this has shown that quite sophisticated cpus can be designed from very little resources.

“BigEd” has done a quick port of the OPC6 onto a Lattice ICE40 – merely to look at the resources used:

After packing: 
IOs 57 / 206 
GBs 0 / 8 
GB_IOs 0 / 8 
LCs 598 / 7680 
DFF 106 
CARRY 64 
CARRY, DFF 0 
DFF PASS 57 
CARRY PASS 3 
BRAMs 2 / 32 
WARMBOOTs 0 / 1 
PLLs 0 / 2 

After placement: 
PIOs 33 / 206 
PLBs 108 / 960 
BRAMs 2 / 32

The after placement figures show that the logic has used about 1/8th of the programmable logic blocks (PLBs) and one sixteenth of the available Block RAM, on the 8k Lattice Ice40.

For comparison – a soft core 6502 uses the following:

After placement: 
PIOs 27 / 206 
PLBs 144 / 960 
BRAMs 0 / 32

With low cost FPGA boards now available, for as little as £40,  supported by an open-source tool chain, there has never been a better time to get started with “One Page Computing”.

The OPC6 is the latest cpu to evolve from the OPC experiment.

It has 16 registers and provides a fairly comprehensive instruction set allowing register to register operations, not dissimilar to the MSP430, but lacking in several of the addressing modes. The conditional predicate bits allow the instruction to be conditionally executed according to the status of the ALU flag bits.

The OPC6 Instruction set is as follows   (See https://revaldinho.github.io/opc/ for full size):

opc6_instruction_set_1

 

 

 

 

Posted in Uncategorized | 3 Comments

Getting started with myStorm BlackIce

dav

 

Getting Started with myStorm BlackIce

Introduction

myStorm BlackIce is a unique combination of low power FPGA and ARM microcontroller designed and manufactured specifically to bring affordable FPGA open source  hardware and software to Hobbyists, Makers and Students.

Working in conjunction with Clifford Wolf’s innovative open source FPGA development toolchain, known as “Project IceStorm”  it allows new digital designs to be written in verilog, synthesised and programmed into the FPGA.

The features found on BlackIce offer the user access to the widest variety of devices and expansion interfaces including PMODs, microSD, SRAM, LEDs switches and buttons.

External circuits connect via the PMOD connectors – and these are available from a variety of sources or for the experienced hobbyist some of these expansion devices may be created at home – using readily available hardware.

Few FPGA development boards feature a powerful 32-bit ARM microcontroller on board , such as the STM32L433 – but it was found at the development stage of the myStorm project that including a microcontroller greatly widened the scope for experimentation, and provided a powerful, versatile, support chip to the FPGA.

The STM32 provides the programming interface for the FPGA, so that it may be programmed over USB using little more than a serial terminal program.

For Windows Users  – TeraTerm is Recommended.

For MAC OS X  Users   – CoolTerm

For Linux Users –  see notes below

Alternatively the FPGA may be programmed via a Raspberry Pi – particularly useful if the Raspberry Pi is being used to host the FPGA development toolchain.

 

Project Icestorm

Project IceStorm was written by Clifford Wolf with contributions from others – and it is used to program a range of FPGAs entirely using open source software.  It is a viable alternative to the proprietary toolchains provided by FPGA vendors which often consume tens of gigabytes on the hard drive. Project IceStorm is tiny by comparison.

Project IceStorm aims at reverse engineering and documenting the bitstream format of Lattice iCE40 FPGAs and providing simple tools for analyzing and creating bitstream files. The IceStorm flow (YosysArachne-pnr, and IceStorm) is a fully open source Verilog-to-Bitstream flow for iCE40 FPGAs.

The focus of the project is on the iCE40 LP/HX 1K/4K/8K chips.   BlackIce uses the iCE40HX4K-TQ144 part

The typical workflow involved in creating a working FPGA design

  1.  Create your design using the verilog hardware description language using a text editor.
  2.  Submit the verilog file (eg design.v) to the YoSys Logic Synthesiser
  3. Invoke the Arachne Place and Route to act on the YoSys output file
  4. Use the IceStorm toolchain  (icepack, icebox, iceprog, icetime, chip databases) to create the bitstream file
  5. Program the Bitstream file into the FPGA

Whilst this set of operations may seem quite complex – it can be automated either from the command line,  or by using the APIO IDE – which is a cross-platform IDE tailored to the requirements of FPGA design projects.

 

The STM32L433 Microcontroller

On the left hand side of the BlackIce pcb is the 64 pin microcontroller which acts as the support device for the FPGA – primarily handling the programming function.

The STM32L433 has 256K of flash memory and 64K of SRAM clocked at a maximum of 80MHz.  It is a powerful microcontroller in its own right – similar in capability to an Arduino Due.

20 of its GPIO lines are brought out to Arduino style headers for maximum flexibility and access to analogue and digital interfaces.

When not being used for programming, the microcontroller can be used for the User’s own application firmware – in a choice of languages including C/C++, Forth or JavaScript (Espruino https://www.espruino.com/) and MicroPython (http://micropython.org/)

The STM32 may be programmed with Arduino (1.8.3 or later) or using the mbed online compiler.  Forth enthusiasts can use Mecrisp Ice V0.9 – which has been specifically tailored to support the STM32 and the ICE40 FPGA.

The STM32 is used to buffer the bitstream file – that is the binary format file that is used to program the FPGA with its logical design.  This bitstream file is created using an open source toolchain called Project IceStorm.

Why a Microcontroller?

The STM32L433 microcontroller is a powerful 32-bit ARM Cortex M4 device clocked at 80MHz and offering 100DMIPS. It is however one of the ultra-low power series – so perfectly complements the iCE40 – which is also a low power part.

Every FPGA needs some supporting hardware to allow the Bitstream file to be programmed into it’s internal RAM – and this is very often done by an FTDI device – the FT2232.  However this FT2232 is an expensive part, costing about $4.50, and it still needs a $0.30 Flash memory to hold the bitfile.  Once the FT2232 has programmed the Flash chip, it can then serve as a UART to USB interface for the FPGA.

When we designed myStorm, we decided that the cost of the FTDI device was prohibitive – and we could get a lot more value from the board by spending that part of the budget on a high spec microcontroller – and using that to provide the FPGA programming function.   When not actually in programming mode – it can be used to host the User’s application – which can be written in a variety of languages.  The result is that you get a general purpose microcontroller development board, with access to 20 lines of digital and analogue I/O – capable of working with some of the currently most popular microcontroller environments.

The microcontroller adds support and versatility to the FPGA, and for some designs the combination of FPGA and ARM will give rise to some unique projects.

The mcu and the FPGA are connected via a Quad SPI bus – capable of running at 60MHz – or 240Mbit/s.

The mcu provides analogue peripherals including a multi-channel 5MSPs 12bit ADC and two 12-bit DAC channels

BlackIce offers features not found in other comparably priced FPGA boards

More PMODS – a total of 6 double and 2 single PMODS   = 56 GPIO lines appear on PMODS including differential LVDS lines

For compatibility – further GPIOs have been routed to an Olimex expansion board header

STM32 ARM Cortex co-processor acts as a complete support system for the FPGA – offering a convenient means of programming the FPGA

The STM32 mcu adds 20off  5V tolerant analogue and digital I/O routed to Arduino style headers – including  6 analogue to digital converter inputs and 2 DAC outputs with 12 bit resolution

STM32  can be used as a slave set of standard peripherals to the outside world – including timers, ADC DAC, SPI, I2C and USB

It can be programmed using STM32Duino, mBed Nucleo, MeCrisp Forth.

BlackIce includes a 256Kx16 SRAM closely coupled to the FPGA

microSD card socket on underside of pcb – which may be accessed either by the FPGA or ARM mcu

Using APIO to manage the ToolChain and Development Process

The APIO package is a plug-in for the open source Atom Editor environment. It was written specifically to manage projects involving the Project Icestorm toolchain and Lattice ICE40 FPGAs.

Project Icestorm consists of several stages which are conveniently managed by the APIO IDE which is a derivative of PlatformIO – specifically tailored to meet the needs of the Project  IceStorm toolchain.  APIO runs as a module within the Atom Editor Environment.

The advantage of using an IDE such as APIO, is that it works across all platforms, and provides a convenient way of managing the toolchain and the various modules (files) that make up the design project.

The design is coded in verilog, and it’s then just a case of building the project.

This invokes the YoSys “Logic Synthesiser” and the Arachne “Place and Route” tools. It then takes the p&r file and converts it into a binary “Bitfile”  which is used to program the FPGA.

On BlackIce, an STM32L433 ARM  M4 Cortex microcontroller is used to manage the programming of the ICE40 FPGA.

The STM32 device runs an application firmware called “Iceboot”  – and this allows the binary bitstream file to be loaded via the STM32 and then into the FPGA.

This is done via the native USB device port on the STM32 microcontroller using a terminal program such as Teraterm, using it’s “Send File” option from the File menu tab. Ensure that the Binary option is checked on the file selection dialogue box. FPGA loading takes only about 2 or 3 seconds.

 

Programming the STM32

In our application the STM32 runs the programming firmware “Iceboot”. However it can be programmed to permanently retain the FPGA bitstream image – so that this is automatically loaded on power-up. This may be useful for demonstration purposes where you want the FPGA image to be non-volatile.

Additionally, the STM32 may be programmed with application firmware, provided that this does not interfere with the FPGA loading mechanism.

20 lines of GPIO, including six ADC channels are brought out to Arduino-style expansion connectors, and these are available for User experimentation.

The STM32 is readily programmed using mbed – as it can be made to look like a Nucleo device – using a ST-Link to program it.  Later versions of the Arduino IDE now support the M4 Cortex microcontroller – so this may also be used.

Alternatively it can be put into DFU  (Device Firmware Upgrade) mode by removing the jumper link connected across the 7th and 8th pin of the right hand row of header pins. Reflashing the firmware using DFU is an advanced topic, and is covered in the Appendix.

Arduino Headers

These headers allow access to many of the interfaces on the microcontroller including ADC inputs,  DAC outputs, SPI, I2C and UART interfaces and timer inputs and outputs.  They also access the four  “slide switch” inputs that connect to both FPGA and ARM.

The headers run east-west across the centre of the board, and break out many of the ARM microcontroller GPIO signal to a set of connectors that are compatible with Arduino shields.  Note that whist these GPIO lines are 5V tolerant – the rest of the board is 3V3 only – so extreme care should be exercised when working with mixed supply rail designs.

Shields

The Arduino Shield headers can be used to access up to 20 digital lines and 6 analogue lines.

50mm x 50mm square pcbs may be conveniently mounted  – or an extended shield 60mm x 50mm will pick up all  14 LVDS pairs from the Olimex connector.  Care should be taken however because the FPGA lines are not 5V tolerant.

Switches

There are to user push button switches which connect to both the microcontroller and the FPGA. These are useful for when a logic design requires momentary intervention.

They are connected to pins  PC8 and PC9 of the microcontroller and pins 63 and 64 of the FPGA.

In addition to the push switches on the left, there are four DIP slide switches arranged in two banks of two. These also connect to both the mcu and the FPGA and are also exposed to the outside world via pins on the “Digital 3” connector of the Arduino Headers.  They are useful for setting up a mode of operation or connecting to external stimuli.

To the right of the mode switches is a Reset Button which provides a momentary active low reset signal to the microcontroller.  This is independent of the high going reset signal that the microcontroller sends to the FPGA at the time of programming.

 

System connector – sometimes called the “Pi Header”

This is a 2 x13 male header on the left hand edge of the pcb which carries certain signals that allow the FPGA to be programmed from an external device such as a Raspberry Pi. It has a mix of programming pins, control pins and power.

The UART RX and TX signal  lines from the FPGA also appear on pins – compatible with the position of the corresponding UART pins on the Raspberry Pi expansion header. Full details of these signals are in the appendix.

Programming Jumper Link.

This is a 2.54mm jumper link which normally bridges across pins 14 and 16 of the system connector.  When removed, it forces the microcontroller into DFU boot mode (Device Firmware Upgrade), which allows it to be programmed with new system firmware. Occasionally there may be a firmware upgrade posted in the myStorm repository to provide new features.

The microcontroller may additionally be programmed using a low cost “ST Link”  – a small system programming device – which may be sourced cheaply from ebay. This plugs into the programming pins of the system header.

For normal operation, the jumper link must remain across pins 14 and 16.

PMODS

These are an industry standard interface connector originally devised by Digilent – a major manufacturer of FPGA development boards.

They come in 6 pin or 12 pin (2 rows of 6 pin) right angled female headers, organised on a specific pitch spacing between connectors.

The connect directly to the FPGA GPIO pins and also carry 3V3 and 0V power.   Small external circuit boards, carrying a variety of expansion circuitry or devices may be plugged directly into these headers – picking up 4 I/O pins for a single PMOD, and 8 I/O pins on a double PMOD.

On BlackIce there are two single PMOD connectors and 6 double PMOD connectors.   The double PMODs on the right hand edge of the pcb carry the fast LVDS (low voltage differential signalling) pairs of signals – and these may be used for driving the most demanding of high speed hardware.

A total of 56 GPIO lines  (including 14 pairs of LVDS lines) are brought out via the PMOD connectors.

There are a whole range of third party PMODs available from a variety of suppliers – such as ADCs, 7 segment displays, audio codecs etc, etc.  See here for further information plus a selection of what is available:  http://store.digilentinc.com/pmod-modules/

The small size of PMOD circuit boards makes them a convenient size for home construction – taking advantage of low cost pcb manufacturing services.

Olimex Expansion Connector

This is a 2 x 17  2.54mm connector that accesses 28 of the FPGA GPIO signals and is compatible with the range of expansion modules supplied by Olimex.   These low cost  modules include ADC, DAC, VGA & PS/2, and buffered digital I/O.

To take advantage of this – a 2 x 17 pin male header should be fitted in this position, or a 2×17 right angled box header fitted from the underside of the board.  More details in the appendix.

MicroSD Card

On the rear of the pcb is a microSD card socket which may be accessed from the FPGA or the microcontroller.

SRAM

Also on the rear of the pcb is a 256Kx16 fast (10nS) asynchronous SRAM – directly connected and closely coupled to the FPGA. This can be used in FPGA designs where additional external storage is required.

Other Features

BlackIce is provides with a 100MHz oscillator module from which internal clock signals for the FPGA may be derived via the internal PLL (phase locked loop) module.

Multiplexer

A small 2 way multiplexer chip allows selection between external programming of the FPGA from the system connector, or onboard programming from the ARM device.  When onboard programming is selected it routes the output of the FPGA signals P52, P53,P54 and P55 to the SPI bus of the microcontroller, whilst ordinarily these are used for driving the LEDs.

https://www.olimex.com/Products/FPGA/iCE40/

User LEDs

There are four coloured LEDs available to the user Blue, Green, Amber and Red.  These can be used to indicate certain status within the FPGA, or to create for example a traffic light display.

An additional STATUS LED is connected to port PC13 of the microcontroller – and can be used for user purposes if the user is developing code for the microcontroller using Arduino, mbed or similar.

A further red LED indicates that the FPGA programming has been DONE.

The power white LED shows when 5V power is connected.

Powering the Unit

The board may be powered from either microUSB socket or from the 5V power and 0V ground pins on the 2x13way connector.

There are two microUSB connectors:

sdr

sdr

This one closest to the ARM chip, just below the 2×13 pin system connector is the direct USB connection to the STM32 microcontroller – and is used primarily for sending the programming bitfile to the mcu.

The lower of the two microUSB sockets connects to a USB-UART converter IC, and is used to communicate serially with the FPGA – assuming that your logic design contains a suitable UART module. It may also be used to convey serial data from the ARM mictocontroller, and on first power-up or following a reset it sends the message showing the firmware revision.  The firmware running on STM32 connects to this com port at 115,200 baud.

Sending a Bitstream file to the BlackIce Board

There are two USB sockets:

The one closer to the middle of the board is connected to the USB interface on the microcontroller – and if you plug into this one  – it will appear as a virtual com port, capable of running at more than 1 Mbit/second.

This is the one that is primarily used to transfer the large bitstream file from the laptop, through the microcontroller and into the FPGA – very quickly – in a couple of seconds.

The microUSB port nearer the corner of the board may also be used for programming but at 115,200 baud – and it is connected to the UART lines both on the microcontroller  and on the FPGA.   It would normally be used for getting serial output from the FPGA  – provided that your logic design included a suitable UART connected to the correct pins.

If you plug into the lower port – set up a 115,200 baud terminal and press the  reset button (top right) you should get this message on the terminal screen

Mystorm Version 0.2

Setup Done

Waiting for UART or USB serial

After you have sent the bitstream file and it is correctly loaded – the message should then say

Config done

Waiting for UART or USB serial

So whilst you can program through both ports – the lower one is at a lower baudrate fixed at 115200 – and ultimately will be used for communications directly from the FPGA.

Note that the version is now 0.2 –  you may previously have been using version 0.1

The other thing to note is that you must keep the jumper link fitted across pins 14 and 16 of the “system” connector.    This is only removed when you want to update the firmware that runs on the microcontroller.

Availability

The myStorm BlackIce boards are now available in production quantities – and priced as follows:

For Customers in UK £40 + £2 postage
For Customers in EU 45 Euros + 4 Euros shipping
For Customers in US $52.50 + $7.00 shipping

For world wide customers – please contact me and ask about shipping costs.

You can PayPal me at ken dot boak at gmail dot com to place an order or discuss volume discounts.

 

Appendix 1.

How to Update the STM32 Firmware using DFU Mode

BlackIce uses an STM32L433 ARM M4 Cortex microcontroller to act as the FPGA  programming and support chip. It allows the upload of the bitstream binary file using the native USB port of the STM32.  This is done using the microUSB connector closest to the ARM processor and using a terminal emulator program such as Teraterm or CoolTerm.
Occasionally the support firmware “IceBoot” will need to be updated – and this is done using the dfu mode – or Device Firmware Upgrade mode of the STM32.  This is a factory installed bootloader which allows the firmwate on the STM32 to be upgraded via USB.
In order to invoke DFU mode – the shorting link fitted between pins 14 and 16 of the “Pi Header” needs to be removed.  This allows a pin to be released from being held low and this forces the STM32 into dfu mode.
Once in dfu mode, it is possible to use the STMicroelectronics Dfuse application (for Windows) or the dfu-util program with Linux to load the new firmware into the STM32.
If you are using the Windows application, the firmware has to be first converted into “dfu format” and this is done with another application called “DFU File Manager” which is bundled with the STM software tools. This can be used to convert a binary or HEX format file into the required dfu format, before using the dfuse programmer application.
How to update the Firmware – for Linux Users

 

For Linux users this process is a little easier as this file conversion is handled automatically by the dfu-util.  Here’s Matthew Venn’s description of how to do this
Here’s how on Linux (works for Ubuntu 14).

Unplug blue jumper on pin7&8.

Plug USB cable into socket closest to ARM chip.

Use lsusb to find the device:

  lsusb
  Bus 001 Device 026: ID 0483:df11 STMicroelectronics STM Device in DFU Mode

Use dfu-util (sudo apt-get install dfu-util) to list DFUs:

  sudo dfu-util -d 0483:df11 -l
  dfu-util 0.5

  (C) 2005-2008 by Weston Schmidt, Harald Welte and OpenMoko Inc.
  (C) 2010-2011 Tormod Volden (DfuSe support)
  This program is Free Software and has ABSOLUTELY NO WARRANTY

  dfu-util does currently only support DFU version 1.0

  Filter on vendor = 0x0483 product = 0xdf11
  Found Runtime: [05ac:828f] devnum=0, cfg=1, intf=3, alt=0, name="UNDEFINED"
  Found DFU: [0483:df11] devnum=0, cfg=1, intf=0, alt=0, name="@Internal Flash  /0x08000000/0128*0002Kg"
  Found DFU: [0483:df11] devnum=0, cfg=1, intf=0, alt=1, name="@Option Bytes  /0x1FFF7800/01*040 e"
  Found DFU: [0483:df11] devnum=0, cfg=1, intf=0, alt=2, name="@OTP Memory /0x1FFF7000/01*0001Ke"
  Found DFU: [0483:df11] devnum=0, cfg=1, intf=0, alt=3, name="@Device Feature/0xFFFF0000/01*004 e"

We want the internal flash (alt=0), so now with the iceboot.dfu:

  sudo dfu-util -d 0483:df11 -D ~/Downloads/iceboot.dfu --alt 0
  dfu-util 0.5

  (C) 2005-2008 by Weston Schmidt, Harald Welte and OpenMoko Inc.
  (C) 2010-2011 Tormod Volden (DfuSe support)
  This program is Free Software and has ABSOLUTELY NO WARRANTY

  dfu-util does currently only support DFU version 1.0

  Filter on vendor = 0x0483 product = 0xdf11
  Opening DFU USB device... ID 0483:df11
  Run-time device DFU version 011a
  Found DFU: [0483:df11] devnum=0, cfg=1, intf=0, alt=0, name="@Internal Flash  /0x08000000/0128*0002Kg"
  Claiming USB DFU Interface...
  Setting Alternate Setting #0 ...
  Determining device status: state = dfuIDLE, status = 0
  dfuIDLE, continuing
  DFU mode device DFU version 011a
  Device returned transfer size 2048
  Dfu suffix version 11a
  Warning: File product ID 0000 does not match device df11
  DfuSe interface name: "Internal Flash  "
  file contains 1 DFU images
  parsing DFU image 1
  image for alternate setting 0, (1 elements, total size = 22744)
  parsing element 1, address = 0x08000000, size = 22736
  done parsing DfuSe file

Then unplug usb, put jumper back, plug into other USB socket and check dmesg:

  [108129.748055] ch341 1-1.2.3.4.3:1.0: ch341-uart converter detected
  [108129.749285] usb 1-1.2.3.4.3: ch341-uart converter now attached to ttyUSB0

Check serial output:

  miniterm /dev/ttyUSB0 115200
  --- Miniterm on /dev/ttyUSB0  115200,8,N,1 ---
  --- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---

Press reset button:

  Mystorm version 0.2
  Setup done
  Waiting for UART

Now you should be able to flash your mystorm like this:

cat chip.bin > /dev/ttyUSB0

(takes about 10 seconds).

 

Appendix 2.

 

STM32 GPIO Pin Out

Pin Port Primary  Analogue Other 
DIG0 PC4 USART3TX ADC_IN14
DIG1 PC5 USART3RX ADC_IN15
DIG2 PB1 ADC_IN9
DIG3 PB10 USART3_TX
DIG4 PB11 USART3_RX
DIG5 PA5 ADC_IN5
DIG6 PA8 TIM1_CH1
DIG7 PC6 TIM3_CH1
DIG8 PC7 TIM3_CH2
DIG9 PA4 ADC_IN4
DIG10 PB12 SPI2_NSS
DIG11 PB15 SPI2_MOSI
DIG12 PB14 SPI2_MISO
DIG13 PB13 SPI2_SCK
DIG14 PB9 I2C1_SDA
DIG15 PB8 I2C1_SCL
DIG16 PD2 GPIO SW1
DIG17 PC10 USART3/4 TX SW2
DIG18 PC11 USART3/4 RX SW3
DIG19 PC12 USART3/4_CK SW4
A0 PC0 ADC_IN10
A1 PC1 ADC_IN11
A2 PC2 SPI2_MISO ADC_IN12
A3 PC3 SPI2_MOSI ADC_IN13
A4 PA0 USART4_TX ADC_IN0
A5 PA1 USART4_RX ADC_IN1
Posted in Uncategorized | Leave a comment

Exploring DIY CPUs

In the last couple of posts I described how I have used a TI MSP430 Launchpad to emulate the 1949 Cambridge University EDSAC machine.

This opens up the possibility of using a similar scheme to emulate virtually any DIY cpu, possibly with a novel or experimental instruction set.

The MSP430 Launchpad was used only because I had it at hand, but the cpu model and the SIMPL communications framework, being written in C, could be ported to almost any microcontroller – especially if they are supported by the ever-growing Arduino IDE, now with ARM Cortex M0, M1 and M4 extensions.

A Look at an instruction set for a MISC Processor.

As I am a proponent of MISC (minimum instruction set computers), the task of simulating the cpu is simplified to just 32 instructions – represented by a 5-bit token, and I have chosen standard ascii characters to represent my instruction primitive. EDSAC used 18 capital letters to represent its instructions, with a mnemonic flavour, but any printable character is appropriate to represent an instruction primitive.

These primitives represent various ALU instructions, data moves between the ALU and memory or registers, and various input-output instructions.

I have attempted to pick a mix of instructions that are easy to implement, useful and offer flexibility.  Whilst anyone can have aspirations about designing the ISA of a new processor, it is likely to have a minimum number of instructions to be useful.

This number was estimated to be around 25 in the 1990s – by Bill Meunch, Chen Hanson Ting Chuck Moore and others – working MISC and Forth cpu implementations. Previous cpus have successfully use 16 or even 8 instructions (see below).

From 25 primitive instructions,  more complex instructions can b e synthesised by assembling groups of these primitives into macros.

ALU Group                 NOP, ADD, SUB, AND, OR, XOR, INV,  SHL, SHR

Call/Jump Group       CALL, JMP, JEQ, JGT, JLT, RET

Memory Group          FETCH, STORE

Stack Operations      DUP, DROP, SWAP, OVER, PUSH, POP

I/O Group                    IN, OUT, EMIT, KEY     (emit is putchar and key is getchar)

Register Transfers   TO-R, FROM-R

Program Flow            FOR, NEXT

Thus we have 31 instructions that are tailored towards a stack based processor. For a more conventional register processor, or load-store architecture there would probably be more instructions for register transfers and richer addressing modes.

It could also be argued that all of the conditional jumps might not strictly be required, as a jump if zero can implement all program control structures.  Additionally if the processor has memory mapped peripherals then IN and OUT are practically redundant.

Deciding which instructions to keep and which to lose is all part of the mental anguish process of creating a the ISA of a DIY cpu.

Having decide on a set of instruction primitives, and the character tokens to represent them, it’s a case of looking at the overall processor architecture.

One cycle per instruction would be good – always a design aim for a simple processor, and also whether the use of dual port RAM can allow a partial pipeline – such that the ALU is calculating the ALU result(s) whilst the instruction is still being decoded.  This is the approach James Bowman has taken in his J1 Forth Processor.

Stack Processors generally use two stacks  – the data stack and the return stack – but there are other classes of stack machines – see Koopman’s Book on Stack Processors

These stacks can be implemented in main memory, or as a closely coupled register file. The stacks will have their own stack pointers and this is why the TO-R and FROM-R instructions are present to allow the TOP of stack to communicate with R, the Return Stack Pointer. There also needs to be a hardware mechanism to put the current Program Counter onto the return stack, before executing a CALL.

In SIMPL, there are really only two main registers x and y to hold the operands for the ALU.  There is also a loop counter k which is decremented each time around the loop until it zeros, and very occasionally there is need to access another variable.

It could be argued that 16 registers in a stack would be sufficient for most purposes, and at times four would be more than enough. As registers are cheap in FPGAs, then it may be worth considering a hybrid architecture, where the stack is a set of registers – any of which are addressable. This is a deviation from the pure TOP/NEXT architecture of a traditional stack processor – but it extends the processor functionality quite considerably, bringing it in line with many small commercial processors that have a rich set of registers.  And in the virtual machine to execute SIMPL, we need an Instruction Pointer, Program Counter, Return Stack Pointer, Data Stack Pointer, Loop Counter, Top and Next of stack – so 7 registers already.

Having more registers gives greater flexibility, but at the cost of needing more instructions to address them.  One compromise might be the idea of “zero page RAM” – where a limited area of memory has its address coded into the lower part of  instruction – say the bottom 7 bits – and this allows the alu to operate quickly on this subset  of memory without having to wait for a 2nd instruction that holds the address.

Dedicating the lower part of memory to an 8-bit payload, allows short literals to be encoded into the instruction – useful for supplying constants to the ALU, or for comparison operations – jump on match or similar.

 

Implementing SIMPL in hardware.

The end game of looking at cpu design is to attempt to find an architecture that is not only easy to implement, light in hardware and optimised to execute the SIMPL language efficiently.

As SIMPL is implemented in the form of a switch-case statement or hash-table, having an efficient “jump on match” instruction would be a useful asset – where an ascii character can be read from a buffer,  multiplied up by 32 or 64 and stored into the PC causing program execution to jump to the code body defined by that ascii character.

In hardware, decoding the 5-bit token into a longer instruction word, with various bit-fields to control the alu etc, would normally be done with a microcode ROM. In the simulator, this decode function is done with a series of switch-case statements.

For example typing “&” generates a 16 bit word by calculation or from a look-up table, which it loads into the next  RAM address and also prints the string AND to the terminal.

This process is repeated for all legitimate characters, and number literals, and so the RAM can be loaded with the machine code for the proposed processor.  As the 32 instructions will be a common “lingua franca” to all processors, it’s just a case of changing the “look up” to suit the machine language of the target processor.

Hardware versus Software

My interest in Minimum Instruction Set Computers (MISC), lies in particular trying to understand where the border line is drawn between a successful instruction set, and one that is too compromised to work efficiently.  Instructions can always be synthesised from primitives such as ADD and NAND, but this costs cpu cycles. As there is likely to be a hardware XOR function included as part of the full-adder in the cpu, is it not worth the additional instruction multiplex to make this available in the instruction set?

Historical Examples of Minimalisation

The PDP-5 and the more commercially successful PDP-8 were a miracle of minimal hardware design – as when introduced in 1965, a single transistor cost the equivalent of $20 in today’s money. Transistors were used very sparingly, only where necessary to provide signal inversion (logical NOT) and amplification (fan out). The Diode Transistor Logic (DTL) provided an easy mechanism to OR inputs and to AND outputs – so much was made of this feature to minimise the transistor count.

The PDP-8 ALU and Register slices (boards R210 and R211) contained 12 and 13 transistors respectively, and about 10 times that number of diodes.  These were built at a time before ICs were available cheaply, so every last ounce of value had to be extracted from a minimum transistor count. As a result the PDP-8 had only 8 instructions, but employed a trick by where the instruction OPR, which is 7xxx (octal) allowed individual manipulation of the ALU control signals to provide clear and complement and shifting operations. A second set of 7xxx instructions allowed various conditional jumps to be performed. As these operations could be performed in parallel by ORing the various control bits, this gave much greater flexibility to a machine  with apparently only 8 instructions – as follows

  • 000 – AND – and operand with AC.
  • 001 – TAD – add operand to <l,ac>(a 13 bit value).</l,ac>
  • 010 – ISZ – increment operand and skip if result is zero.
  • 011 – DCA – deposit AC in memory and clear AC.
  • 100 – JMS – jump to subroutine.
  • 101 – JMP – jump.
  • 110 – IOT – input/output transfer.
  • 111 – OPR – microcoded operations.

The PDP-8 was legendary for it’s minimalisation, and went on to sell more than 50,000 units over a period of 20 years, and in several hardware revisions – the last being a PDP-8 on a Harris 6120 VLSI chip based on gate arrays.  The PDP-8 has also been recreated in more modern hardware – both in discrete TTL and as a FPGA in VHDL.

A Flexible ALU

The ALU featured in the “NAND to Tetris” course uses 6 control lines to ingeniously control the inputs and outputs to an otherwise simple ADD/AND circuit. However, only 18 of the 64 combinations of instruction are actually used, and one bitslice the ALU contains  the equivalent of 35, 2-input NAND gates – which rapidly escalates to 560 gates for a 16-bit ALU.

Constraining the instruction set to just ADD or NAND halves the number of gates needed but equally shifts the problem over to firmware to synthesise the missing operations from the ADD and NAND primitives.

Before embarking on any hardware implementation it would be good to have the means to simulate  and explore possible instruction set architectures (ISAs), so the technique used to simulate EDSAC can be more widely applied.

Simulating a DIY CPU using SIMPL.

As most MISC machines are likely to have less than 32 instructions, the cpu model is likely to be relatively simple. As seen with EDSAC, the processor model can be constructed in fewer than 50 lines of code, as a case statement that decodes each instruction from memory in turn.

Populating an area of RAM with the correct instructions for the chosen processor can be done with the help of a text interpreter and a look-up table. I used  a modified version of SIMPL to act as an instruction loader/builder, where an instruction in the form nX was assembled into memory.  n is the address/data field of the instruction, and X is the op-code. SIMPL strips off X as a 4 or 5 bit code, and shifts it up to occupy the most significant bits of the instruction word, and places n as the lower 12-bits.

This operation was fast and reliable allowing pseudo-assembly language to be loaded directly into memory using Teraterm.

Once the RAM contains the correct machine instructions, we use the processor model to fetch the instructions one at a time from RAM,  decode them and execute the correct logical and arithmetical behaviour, updating registers, program counter and memory where necessary.

SIMPL was set up so that the “s” command could be used to single step through the instructions – giving a print out of the registers and key memory locations each step. This allowed for elementary debugging of the code.

Another SIMPL command “e” was used to preset the program counter to a given location – so that program execution could begin there.

Further commands and a more refined user interface are planned for a later date.

Great satisfaction was had having code actually run on the simulated EDSAC machine – using nothing more than a text editor, a terminal program and a MSP430 LaunchPad. Such low level computing really does not require sophisticated tools.

 

 

 

 

 

Posted in Uncategorized | 2 Comments

When you are strapped for resources……

In early August, I spent a few days on the Greek peninsula of Kassandra, to the south of Thessaloniki. In order to keep my mind stimulated and an excuse to keep out of the full-on daytime sun, I took along a minimum “Hacker’s Survival Kit”  –  plus my cheap laptop.

The survival kit consisted of a TI MSP430 Launchpad, a breadboard and a few components. I decided that with the “EDSAC Challenge” coming up in early September, that I really ought to try to work on an EDSAC based project – and with the minimum of resources, I reckoned that I stood a fair chance of being able to simulate the EDSAC using the LaunchPad, and possibly even write some simple programs in EDSAC machine language.

EDSAC was the first of the Cambridge University built computers that were designed for real computational work – in order to aid research. It was commissioned in spring of 1949, and ran it’s first demonstration program on May 6th 1949. This poster explains most of the early history of EDSAC.

EDSAC was built using about 3000 vacuum tubes (valves) and it had memory of 1024 words that was based on mercury delay lines. It performed about 650 instructions per second, using a serial ALU – but this represented a 150 times speed up from the use of desktop mechanical calculators that were commonplace at that time.

What was of interest to me was the limited instruction set of EDSAC – only 18 instructions, and how they could be put to use to provide real computation. I had read about Mininmal Instruction Set Computers, from the work by Charles “Chuck” Moore, and his work on Forth processors from the 1980s and 90s – and was intrigued to see what I could achieve on a primitive machine like EDSAC.

Simulating a cpu is fairly easy in C, you just need a series of case statements to handle all the possible instructions. Each case statement updates the accumulator and program counter of the cpu model – and for a machine like EDSAC with almost no registers, the cpu model is very easily described in about 50 lines of code:

//-------------------------------------------------------------------
 // mini-EDSAC CPU Model
 // ------------------------------------------------------------------
 
 static void execute(int instruction) // This is the EDSAC CPU model
 
 // The Alpha code is held in the top 5 bits of the instruction
 // The bottom 11 bits hold the address
 {
 insn = (instruction & 0xF800)/2048 ;
 _pc = pc + 1;
 
 n = (instruction & 0x7FF); // n is the memory address field 
 - lower 11 bits
 
 switch (insn) 
 {
 
 case 0: A += m[n] ; break;                    // ADD A
 case 1: A -= m[n] ; break;                    // SUB Subtract
 case 2: A += (m[n] & R) ; break;              // COL C (AND m[n] with R)
 case 3: m[n] = A ; break;                     // DEP Deposit D -no clear
 case 4: if(A>=0 && A<=32767) {_pc = n;} break;// JGT E
 case 5: break;                                // VER F
 case 6: if(A>=32768 && A<=65535) {_pc = n;} ; break; // JLT E
 case 7: R += m[n] ; break;                    // CPY H - Load R register
 case 8: break;                                // INP I 
 case 9: A = A>>n ; break;                     // Right Shift
 case 10: m[n] = A; A=0 ; break;               // Transfer and clear
 case 11: A = A<<n ; break;                    // LSH L
 case 12: A += (m[n] * R) ; break;             // Multiply and ADD
 case 13: A -= m[n] * R ; break;               // Multiply and Subtract- N
 case 14: uart_putc(m[n]); break;              // OUT O Output a character
 case 15: break;                               // PUT P
 case 16: break;
 case 17: A = m[n]>>1 ; break;                 // RHS R
 case 18: A -= m[n] ; break;                   // SUB S
 case 19: m[n] = A; A=0 ; break;               // TRC T 
 case 20: m[n] = A ; break;                    // UPD U
 case 21: A += (m[n] * R) ; break;             // ML+ V
 case 22: break;
 case 23: break;                               // NOP X
 case 24: break;                               // RND Y
 case 25: break;                               // END Z
 case 26: break; 
 case 27: break; 
 case 28: break; 
 case 29: break; 
 case 30: break; 
 case 31: break; 
 
 }


pc = _pc;
 instruction = m[pc];

next_t = _t;

}
 // End of mini-EDSAC CPU model
 // ---------------------------------------------------------

As can be seen, we have a fairly conventional Load-Store or Von Neumann architecture where most operations are performed on a memory location m[n] and the Accumulator A. There is a multiplier register R which is used for certain operations – including multiply and ADD and multiply and SUBtract.

In order to simplify the EDSAC model, it was decided to break with the conventional instruction mnemonics and make sure all of the 16 most useful instructions could fit into an instruction based on a 4-bit code.

Because of the way that integer numbers are handled by the MSP430G2553 on the LaunchPad, when defining the conditional jump instructions I had to put additional constraints to define what was positive and negative integers.

In order to program the mini-EDSAC model, it was necessary to load the instruction plus operand/address into an array of 16-bit words. I chose to do this using the SIMPL programming framework that was already running on the MSP430.

The only problem is that SIMPL is reverse Polish (post-fix) based – so needed the operand (number) first, followed by the operation.

With only a few minor changes to SIMPL I was able to load RAM with short programs and single step through the EDSAC code.

The first task was to get the EDSAC decimal integer print routine to work.  I had an example from the “Squares” program of how to print decimal integers – but despite this it still took me a day of debugging to get this to work.

Once I had the means to get numeric output from the mini-EDSAC simulator, I wrote a short program to add 123 to 456 and print out the answer as a decimal integer. This worked perfectly, and in just 272 machine cycles of the mini-EDSAC simulator, I had the correct answer, 579, printed to the terminal.

With the basics in place with the mini-EDSAC simulator, it’s time to port it to an MSP430 with more RAM – and to further explore the programming techniques of these very simple computational machines.

Of course the 16 instruction model could be extended – making use of more of the uppercase characters to act as instruction mnemonics.  Allowing a 5-bit instruction word, means 32 unique instructions – which gives a much greater flexibility of the machine.

Once I have mastered the EDSAC instruction set, it will be time to look at other architectures of Minimal Instruction Set Computers. (MISC).

 

 


							
Posted in Uncategorized | Leave a comment

Computer Science from the Ground Up.

 

Full_Adder_DTL_7

Imagine that you have just taken a 1 term course in Electronic Engineering and Computer Science and you are now facing the end of term exam:

Part 1 – Digital Logic.

1a. Using transistors, diodes and resistors, show how a 2-input NAND gate can be constructed. Show how the 2-inputs can be extended to allow for additional inputs.

1b. Using standard logic symbols show how the following gates may be synthesised from the NAND component.  AND, OR, XOR, NOR

2a. Using conventional logic symbols, show how a half adder my be created from gates devised in 1b. What is the minimum gate count?

2b. Show in a digram, how the half adder can be extended to a full adder, allowing for any additional external gatess that may be required.

3a.  Show the schematic for a 2:1 multiplexer using 2 input NAND gates

3b.  Show how the multiplexer may be extended to 3 input and 4 input. Include the gate count for each of your designs.

4a. A small cpu requires an arithmetical logic unit (ALU) capable of the following operations.

AND, OR, XOR, NOT, ADD, SUBTRACT

Produce a block diagram showing how this may be achieved from conventional logic gates. Pay specific attention to the detail of how each operation is selected.

4b. Explain the term “Bitslice” and using the design for the ALU in 4a. show how the logic may be partitioned into a bitslice design.

5a. A logic design requires a clocked D-type Flip Flop (DFF) – show how this may be created from NAND gates.  Pay particular attention to the method by which the DFF is loaded and clocked.

5b. A small cpu needs a program counter capable of addressing 64K words of memory. Using the DFF from 5a, and any other external logic – show how this may be achieved.

6a. Using the functional blocks described above, plus external memory such as RAM and/or ROM show how these elements may be interconnected connected to form a simple 16-bit cpu.

6b. Explain the operation of the cpu in terms of the sequence of events required to fetch, decode and execute an instruction from memory.

N2Tcover

Now, whilst I studied electronic engineering in the early 1980s, I too would find some of those questions quite a challenge, but as a result of the online course “From NAND to Tetris”, which introduces digital logic and computer systems in a series of heirarchical layers – demystifying each layer in turn. Accompanying the course are a series of course lecture notes, a text book “The Elements of Computer Systems (TECS – available online as pdf) plus hardware and software simulators that will run on any common laptop platform.

Starting from the basic NAND gate the course shows in 6 lectures how to create a complete 16-bit computer from logic “chips” designed using a hardware description language. The computer design, called “Hack” is backed up by a laptop based simulator, and capable of running real code.

With such a structured course, it makes it possible to learn the basics needed to answer the above questions in full in just six weeks of lectures and practical study workshops.

Whilst “NAND to Tetris” is a well structured course and has been designed to allow maximum access and exposure for self-learners to the course materials, the course is based upon an artificially created hardware description language, and a series of test scripts and simulators written in Java. This makes the course viable to the widest audience, who may be of limited resources, but it does not provide hands-on experience of real hardware.

Through my work with myStormmyStorm – the open source FPGA experiment board, I hope to show how “NAND to Tetris” can be used as the basis for a FPGA learning course – using low cost open source FPGA hardware and tools. With the “BlackIce” FPGA board – it will be possible to create a real “Hack” computer – and have the means to connect it to a variety of experiments.

How to introduce Electronics and Computer Science to undergraduate students, makers and hobbyists in a practical “learning by doing” manner, will be the subject of a my forthcoming presentation “Computer Science From the Ground Up”  for OSHCamp, at Wuthering Bytes, on Saturday 2nd September.

 

Posted in Uncategorized | Leave a comment