asm2650 – a 2650 assembler

This is an assembler for the Signetics 2650 processor and its variants. It is based on the assembler published on https://binnie.id.au/MicroByte/ but includes a number of improvements. In short:

  • it provides extensive error checking and warnings;
  • it allows any kind of whitespace and upper and lower case;
  • syntax has been extended to provide for various formats in use;
  • the extra instructions for the 2650B-processor are supported.

Download

Like the original assembler from MicroByte asm2650 is written in Python; it should work with most Python installations. Unlike the MicroByte assembler, asm2650 is contained in a single Python file.

The file is called asm2650; you can rename it to asm2650.py if your Python installation requires this.

Usage

asm2650 [-h] [-W {rel,label,instr,none}] [-B] [-l «listout»] [-o «codeout»] [-s «sedfile»] «infile» 

Process 2650 Assembler Code -- version 1.0.13 

positional arguments: 
  «infile» input source code file 

optional arguments: 
  -h, --help show this help message and exit 
  -W, --nowarn {rel,label,instr,none} disable specified warning 
  -B, --allow2650b enable instructions specific to 2650B-variant 
  -l «listout» output text listing file 
  -o «codeout» output binary code file 
  -s «sedfile» output sed instructions

–nowarn

Warnings can be disabled using --nowarn:

rel: no warning when a relative instruction could have been used instead of an absolute instructions. When the offset is small (between -64 and +63) the instruction could be replaced by a relative instruction. For example:

LOOP BDRA,R1 LOOP 

Using a relative instruction instead of an absolute instruction would save one byte in memory and is also executed faster by the 2650. Note that on Unix-like systems these corrections can be automatically applied using the -s option (see below).

label: no warning when a label is redefined; the last definition of the label is used. Typically each label is defined only once, and redefinitions indicate a mistake or typo.

instr: no warnings about unusual instructions. Some instructions are not forbidden, but highly unusual. For example, indexed instructions can use R0 both as the index register and as the (default) operand, or a RES with an operand of zero. This is not typically what you want.

none: disable all warnings.

The --nowarn option can be specified multiple times, e.g. -W rel -W instr.

Warnings can be suppressed on a specific line by including the string *NOWARN as part of the comment.

–allow2650b

The 2650B variant of the processor adds LDPL and STPL instructions for directly loading or storing the lower Program Status Word to memory. The original 2650 does not support these instructions. If you wish to use these instructions, explicitly specify the --allow2650b option.

-l / -o: output options

Without any output options specified the assembler will generate a listing on standard output. If either -l or -o is specified no output is shown on standard output. Use -l to save a listing (code together with the original input file). Use -o to save the assembled instructions and data into a binary file. You can use both.

-s: create sed-file

Some warnings can be fixed by the sed editor that is common on Unix-like systems. The sed instruction takes a file containing editing instructions, which is generated by the assembler when the -s options is used. Apply the sed-file as follows:

sed -f «sedfile» -i.bak «infile»

Code format

This section describes the valid input format recognised by the assembler.

Pseudo-ops

Pseudo-ops do not generate code for the 2650; they are instructions to the assembler. The assembler supports all known pseudo-ops, although it ignores a number of them. See the Signetics 2650 manual for a more detailed description of assembler pseudo-ops.

  • ORG sets the location counter. Typically this determines the memory address where the code will be located.Each input file should contain at least one ORG instruction. Without an ORG instruction the assembler assumes that the code starts at address 0000. This is normally not what is intended, and will generate a warning message. You should specify ORG 0000 to make the assumption explicit.
  • EQU defines a new symbol.
  • ACON or DW defines two bytes that contain an address value.
  • DATA or DB defines one or more bytes that contain the specified values.
  • RES reserves memory. In the output file these bytes are set to zero.
  • DFLT determines whether numerical constants are interpreted as decimal or hexadecimal. Values 0 or 10 indicate that numbers will be decimal; values 1 or 16 indicate that numbers will be hexadecimal. Note that you can always explicitly specify the base; e.g. sixteen can be written as D’16’ or H’10’.

The pseudo-ops STARTENDEJE , PRTSPCTITL, and PCH are accepted but ignored.

The RES pseudo-op

The binary output file only contains the content bytes. It does not contain any reference to the intended starting address in memory. This also means that the binary output cannot contain multiple sections. The source assembler file can contain multiple sections; each ORG instruction starts a new section at a new memory address. A source file with two ORG instructions would contain three sections. (Note that the start of the source file contains a section at the implicit address 0000).

This leads to the question how the RES pseudo op should be treated. Consider the following code

; Scratch memory 
       ORG      H'17C0' 
COUNT  RES      1 
       ... 

; Code section
       ORG      H'1510' 
START  eorz,r0 
       ... 
BUF    RES      D'20' 
MORE   lodi,r1  5

Ideally the first section should only define the items in scratch memory but produce no output to the binary output file. The second section should produce output, including zeroes for the RES instruction.

The assembler addresses the issue in this way: if the current section already produced output bytes then the RES will also produce zeroes. But if the current section did not produce any output (as in the definitions in the scratch section above) RES will not produce output bytes.

In other words, RES will not be the first instruction in a section to produce output, but will produce output if it follows other output bytes.

Expressions

Labels and Symbols must start with a letter and may contain letters, digits and the underscore _. Labels are case-insensitive, so StartAddr and StartADDR refer to the same symbol. Case is preserved in the output.

Numbers consist of (hexa)decimal digits, depending on the default base. If the base is set to hexadecimal (using the DFLT pseudo-op) then hexadecimal numbers must start with a digit 0-9 so as not to confuse them with a symbol. For example FF should be written as 0FF. To explicitly indicate the base, use the notation D'16' or H'10' (or d'16' , h'10' in lowercase). Alternatively, hexadecimal numbers can be prefixed with a dollar-sign, e.g. $10.

Negative numbers have a minus sign in front, and positive number may optionally start with a plus sign.

Octal or binary constants are not supported.

The high byte of a two-byte value is obtained by <Label; the low byte of a two-byte value is obtained by >Label. It is an error to use a two byte value for immediate commands: use lodi,r0 >Addr instead of lodi,r0 Addr. Other assemblers will silently discard the hgih byte, but asm2650 flags this as en error to prevent errors. Note that the < and > operator have higher priority than addition or subtraction, so you may want to use <Addr+h'100' instead of <Addr+1 to obtain the next block after Addr.

ASCII text strings are specified using A'Some text here' or "Some text here". When a string is specified using double quotes, the expressions \xNN (where N indicates any hexadecimal digit) are replaced with the character with ascii code NN. For example, a line ending can be specified as "Line end\x0D\x0A", or a C-style zero-terminated string as "Some text\x00".

Text using the EBCDIC encoding is not supported.

The symbol $ refers to the current location (not to be confused with $10 hexadecimal numbers).

Expressions can include addition and subtraction, e.g.:

$+2 
LoadAddress-1 
Buffer+Length-1

Note that whitespace is not allowed inside expressions! The following will (silently) yield incorrect results

LoadAddres - 1          does not work 
Buffer + Length - 1     also does not work

Many instructions accept a list of expressions. List items are separated by a comma, optionally followed by whitespace.

Comments

Full-line comments are indicated by an asterisk * or semicolon ; as the first character on the line. Otherwise, any valid instruction can be followed by one or more spaces or tabs; the remainder of the line is then taken to be a comment. E.g.:

; This is a full line comment 
* This, too. 
     lodi,r0  5   This is a comment as well

The starting position of the comment does not matter. The following line is therefore confusing but valid:

     lodz,r1  5   This error cannot be detected.

The lodz command does not take an operand, so the number 5 is taken to be a comment. If you actually meant lodi,r1 5 the assembler cannot warn you about this typo.

Program Code Variants

There are two variants: as used by Signetics, and as produced by the DASMx disassembler from Conquest Consultants. See:

This assembler supports both variants, but with some improvements:

  • Upper and lower case may be used. The assembler is case-insensitive but case-preserving.
  • Labels can be any length, and may contain underscores ‘_’.
  • Constants can only be specified in decimal, hexadecimal or ASCII format. Octal, binary and EBCDIC constants are not supported.
  • Forward references may be used anywhere, including register codes, condition codes and assembler pseudo-ops.

Whitespace at the end of lines is ignored.

Blank lines and full-line comments are copied into the output listing.

A line that starts with a symbol followed by a colon defines a new label. For example:

Fill_the_buffer: 
      stra,r0  buff,r1 
      bdrr,r1  Fill_the_buffer

Otherwise labels start at the beginning of the line and are followed by an instruction or pseudo-op:

Loop stra,r0  Buffer,r1 
     bdrr,r1  Loop

Indirect addressing is specified by using the asterisk * as the first character of the operand. With some assemblers the * character has to be in a specific character position on the line; asm2650 has no such restrictions. The operand can be any valid expression. For example:

     stra,r1  *Buffer+Len

Indexed addressing is specified by adding a register and (optionally) increment or decrement to the operand. Common notations are supported:

     stra,r0  Buffer,r1    Plain indexing 
     stra,r0  Buffer,r1+   Auto-increment 
     stra,r0  Buffer,r1,+  Auto-increment 
     stra,r0  Buffer,r1-   Auto-decrement 
     stra,r0  Buffer,r1,-  Auto-decrement

For instructions that use Register addressing, two notations are in common use. The assembler accepts either:

     lodz,r1 
     lodz     r1           This is the same as the previous line

The second notation is not allowed for other addressing modes. All of the following are invalid:

     lodi     r1,5         Incorrect 
     lodr     r1,ADDR      Also incorrect 
     loda     r1,ADDR      Same

Note that there can be no whitespace around the comma between the opcode and the register or condition code:

     lodz,r1               This is OK 
     lodz, r1              This raises an error

Predefined symbols

The following common symbols are predefined.

  • Registers R0, R1, R2, R3
  • Condition codes EQ, LT, GT, UN, NE and Z, P, N
  • Program status word bits SENS, FLAG, II, IDC, RS, WC, OVF, LCOM, CAR

Version history

1.0.0 — First release.

1.0.1 — The RES pseudo-op no longer generates output in the binary file. Previously bytes 00 were written. If you want this, use DATA or DB instead of RES. It is now OK to use multiple ORG sections, as long as only the first one actually generates binary code. Multiple sections that actually contain data will still result in a warning, which can be suppressed using --nowarn org

1.0.2 — Add an error for LODZ,R0: according to the Signetics manual the results of this instruction are undefined.

1.0.3 — Better handling of large syntax errors leading to exceptions in Python.

1.0.4 — A number of changes to support more input formats: The dollar-sign can be used to indicate a hexadecimal number (e.g. $0d).

— The START pseudo-op is accepted, but ignored.

— Also, full-line comments can now start with whitespace. The first non-space character must still be a asterisk * or semicolon ;, but there can be whitespace at the start of the line.

— Input files are always read as UTF8 files. Incorrect UTF8 codes would crash the assembler; this is now fixed.

1.0.5 — Labels starting with letter R followed by a number would be incorrectly interpreted as index registers. For example, label R012 would be interpreted as R0 followed by the number 12.

— When errors are encountered the output files will be deleted (as they will contain invalid data). This used to be a problem when they were special files such as /dev/null. The following is now possible: 

asm2650 -l /dev/null Myfile.2650

1.0.6 — Fixed: using $ to refer to the current code address stopped working since 1.0.4.

1.0.7 — Changed the meaning of the RES pseudo again. See the section ‘Pseudo-ops’.

1.0.8 — Pseudo-ops that are ignored no longer require an argument.

— The listing now shows a list of all symbols in alphabetical order. Thanks Neill!

1.0.9 — Single-letter strings are now correctly converted to their ASCII-equivalent for EQU.

— More useful error message when a line contains only a label and nothing else. Thanks again Neill!

— When ignoring a pseudo line END, do remember its label.

— Added several warnings for unusual instructions.

— Also removed the org warning, as it did not make sense anymore.

1.0.10 — Fixed: a DATA pseudo containing forward references to labels would lead to incorrect output.

1.0.11 — Added an error message when attempting to use absolute addressing across page boundaries.

1.0.12 — Added option to disable warnings on a specific line using the *NOWARN comment.

1.0.13 — Added the -s option to automatically fix certain warnings.