asm2650: an assembler for the 2560 microprocessor

asm2650: an assembler for the 2560 microprocessor

This is an assembler for the Signetics 2650 processor and its variants. It is based on the assembler published on https://binnie.id.au/MicroByte/ but includes a number of improvements. In short:

Download

Like the original assembler from MicroByte asm2650 is written in Python; it should work with most Python installations. Unlike the MicroByte assembler, asm2650 is contained in a single Python file. Download the assembler here .

The file is called asm2650; you can rename it to asm2650.py if your Python installation requires this.

Usage

asm2650 [-h] [-W {org,rel,label,instr,none}] [-B] [-l «listout»] [-o «codeout»] «infile»

Process 2650 Assembler Code -- version 1.0.7

positional arguments:
  «infile»              input source code file

optional arguments:
  -h, --help            show this help message and exit
  -W, --nowarn {org,rel,label,instr,none}
                        disable specified warning
  -B, --allow2650b      enable instructions specific to 2650B-variant
  -l «listout»          output text listing file
  -o «codeout»          output binary code file

Warnings can be disabled using --nowarn:

The --nowarn option can be specified multiple times, e.g. -W rel -W instr.

The 2650B variant of the processor adds two instructions for directly loading or storing the lower Program Status Word to memory. The original 2650 does not support these instructions. If you wish to use these instructions, explicitly specify the --allow2650b option.

Without any output options specified the assembler will generate a listing on standard output. If either -l or -o is specified no output is shown on standard output. Use -l to save a listing (code together with the original input file). Use -o to save the assembled instructions and data into a binary file. You can use both.

Code format

This section describes the valid input format recognised by the assembler.

Pseudo-ops

Pseudo-ops do not generate code for the 2650; they are instructions to the assembler. The assembler supports all known pseudo-ops, although it ignores a number of them. See the Signetics 2650 manual for a more detailed description of assembler pseudo-ops.

The pseudo-ops START, END, EJE , PRT, SPC, TITL, and PCH are accepted but ignored.

The RES pseudo-op

The binary output file only contains the content bytes. It does not contain any reference to the intended starting address in memory. This also means that the binary output cannot contain multiple sections. The source assembler file can contain multiple sections; each ORG instruction starts a new section at a new memory address. A source file with two ORG instructions would contain three sections. (Note that the start of the source file contains a section at the implicit address 0000).

This leads to the question how the RES pseudo op should be treated. Consider the following code

; Scratch memory
       ORG     H'17C0'
COUNT  RES     1
       ...

; Code section
       ORG     H'1510'
START  eorz,r0
       ...
BUF    RES     D'20'
MORE   lodi,r1 5

Ideally the first section should only define the items in scratch memory but produce no output to the binary output file. The second section should produce output, including zeroes for the RES instruction.

The assembler addresses the issue in this way: if the current section already produced output bytes then the RES will also produce zeroes. But if the current section did not produce any output (as in the definitions in the scratch section above) RES will not produce output bytes.

In other words, RES will not be the first instruction in a section to produce output, but will produce output if it follows other output bytes.

Expressions

Labels and Symbols must start with a letter and may contain letters, digits and the underscore _. Labels are case-insensitive, so StartAddr and StartADDR refer to the same symbol. Case is preserved in the output.

Numbers consist of (hexa)decimal digits, depending on the default base. If the base is set to hexadecimal (using the DFLT pseudo-op) then hexadecimal numbers must start with a digit 0-9 so as not to confuse them with a symbol. For example FF should be written as 0FF. To explicitly indicate the base, use the notation D'16' or H'10' (or d'16' , h'10' in lowercase). Alternatively, hexadecimal number can be prefixed with a dollar-sign, e.g. $10.

Negative numbers have a minus sign in front, and positive number may optionally start with a plus sign.

Octal or binary constants are not supported.

ASCII text strings are specified using A'Some text here' or "Some text here". When a string is specified using double quotes, the expressions \xNN (where N indicates any hexadecimal digit) are replaced with the character with ascii code NN. For example, a line ending can be specified as "Line end\x0D\x0A", or a C-style zero-terminated string as "Some text\x00".

Text using the EBCDIC encoding is not supported.

The symbol $ refers to the current location.

Expressions can include addition and subtraction, e.g.:

$+2
LoadAddress-1
Buffer+Length-1

Note that whitespace is not allowed inside expressions! The following will (silently) yield incorrect results

LoadAddres - 1          does not work
Buffer + Length - 1     also does not work

Many instructions accept a list of expressions. List items are separated by a comma, optionally followed by whitespace.

Comments

Full-line comments are indicated by an asterisk * or semicolon ; as the first character on the line. Otherwise, any valid instruction can be followed by one or more spaces or tabs; the remainder of the line is then taken to be a comment. E.g.:

; This is a full line comment
* This, too.
    lodi,r0  5  This is a comment as well

The starting position of the comment does not matter. The following line is therefore valid:

lodz,r1  5

The lodz command does not take an operand, so the number 5 is taken to be a comment. If you actually meant lodi,r1 5 the assembler cannot warn you about this typo.

Program Code Variants

There are two variants: as used by Signetics, and as produced by the DASMx disassembler from Conquest Consultants. See:

This assembler supports both variants, but with some improvements:

Whitespace at the end of lines is ignored.

Blank lines and full-line comments are copied into the output listing.

A line that starts with a symbol followed by a colon defines a new label. For example:

Fill_the_buffer:
    stra,r0 buff,r1
    bdrr,r1 Fill_the_buffer

Otherwise labels start at the beginning of the line and are followed by an instruction or pseudo-op:

Loop stra,r0 Buffer,r1
     bdrr,r1 Loop

Indirect addressing is specified by using the asterisk * as the first character of the operand. With some assemblers the * character has to be in a specific character position on the line; asm2650 has no such restrictions. The operand can be any valid expression. For example:

     stra,r1 *Buffer+Len

Indexed addressing is specified by adding a register and (optionally) increment/decrement to the operand. Common notations are supported:

    stra,r0 Buffer,r1    Plain indexing
    stra,r0 Buffer,r1+   Auto-increment
    stra,r0 Buffer,r1,+  Auto-increment
    stra,r0 Buffer,r1-   Auto-decrement
    stra,r0 Buffer,r1,-  Auto-decrement

For instructions that use Register addressing, two notations are in common use. The assembler accepts either:

    lodz,r1
    lodz    r1           This is the same as the previous line

The second notation is not allowed for other addressing modes. All of the following are invalid:

    lodi    r1,5         Incorrect
    lodr    r1,ADDR      Also incorrect
    loda    r1,ADDR      Same

Note that there can be no whitespace around the comma between the opcode and the register or condition code:

    lodz,r1              This is OK
    lodz, r1             This raises an error

Predefined symbols

The following common symbols are predefined.

Version history

1.0.0 -- First release.

1.0.1 -- The RES pseudo-op no longer generates output in the binary file. Previously bytes 00 were written. If you want this, use DATA or DB instead of RES. It is now OK to use multiple ORG sections, as long as only the first one actually generates binary code. Multiple sections that actually contain data will still result in a warning, which can be suppressed using --nowarn org

1.0.2 -- Add an error for LODZ,R0: according to the Signetics manual the results of this instruction are undefined.

1.0.3 -- Better handling of large syntax errors leading to exceptions in Python.

1.0.4 -- A number of changes to support more input formats: The dollar-sign can be used to indicate a hexadecimal number (e.g. $0d).

The START pseudo-op is accepted, but ignored.

Also, full-line comments can now start with whitespace. The first non-space character must still be a asterisk * or semicolon ;, but there can be whitespace at the start of the line.

Input files are always read as UTF8 files. Incorrect UTF8 codes would crash the assembler; this is now fixed.

1.0.5 -- Labels starting with letter R followed by a number would be incorrectly interpreted as index registers. For example, label R012 would be interpreted as R0 followed by the number 12.

When errors are encountered the output files will be deleted (as they will contain invalid data). This used to be a problem when they were special files such as /dev/null. The following is now possible: asm2560 -l /dev/null Myfile.2650

1.0.6 -- Fixed: using $ to refer to the current code address stopped working since 1.0.4.

1.0.7 -- Changed the meaning of the RES pseudo again. See the section 'Pseudo-ops'.


Contact me

Eelco Vriezekolk

Questions? Send me an email on eelco [at-sign] ztpe [little dot] nl