asm2650 – a 2650 assembler

This is version 2 of my assembler for the Signetics 2650 processor. It is based on the assembler published on https://binnie.id.au/MicroByte/ but has been extensively modified.

Some features:

  • Helpful. It provides extensive error checking and warnings. Warnings can be disabled, on a line-by-line basis or per category. Makes sure that your produced output is absolutely correct.
  • Free form. No rigid formatting of the source files. Use any kind of whitespace, symbols of any length, and mix upper and lower case. Use underscores in symbols if that helps you.
  • Organised. Separate your code into separate files and combine them using the INCLUDE keyword.
  • Flexible. Use either the Signetics syntax or the style produced by the DASMx disassembler. asm2650 supports both.
  • Complete. The extra instructions for the 2650B-processor are supported.

Download

asm2650 is written in Python; it should work with most Python3 installations.

The file is called asm2650; you can rename it to asm2650.py if your Python installation requires this.

Manual

Table of contents

Usage

asm2650 [-h] [-W {rel,label,instr,none}] [-B] [--segments {single,padded}] [-Ss] [-Sp] [--include DIR] [-l LISTOUT] [-o CODEOUT]
               [-s SEDOUT] [--debug]
               infile

Process 2650 Assembler Code -- version 2.2.1

positional arguments:
  infile                input source code file

optional arguments:
  -h, --help            show this help message and exit
  -W {rel,label,instr,none}, --nowarn {rel,label,instr,none}
                        disable specified warning
  -B, --allow2650b      enable instructions specific to 2650B-variant
  --segments {single,padded}
                        how to handle multiple code segments (default=padded)
  -Ss                   same as --segments single
  -Sp                   same as --segments padded
  --include DIR, -I DIR directory for INCLUDE
  -l LISTOUT            output text listing file
  -o CODEOUT            output binary code file
  -s SEDOUT             output sed instructions
  -H, --hex             unprefixed numerical constants are hexadecimal
  --debug, -d           enable debugging

–nowarn

Categories of warnings can be disabled using --nowarn, as described below.

--nowarn rel: no warning when a relative instruction could have been used instead of an absolute instructions. When the branch offset is small (between -64 and +63) the instruction could be replaced by a relative instruction. For example in:

LOOP BDRA,R1 LOOP 

Using a relative instruction instead of an absolute instruction would save one byte in memory and is also executed faster by the 2650. Note that on Unix-like systems these corrections can be automatically applied using the -s option (see below).

--nowarn label: no warning when a label is redefined; the last definition of the label is used. Typically each label is defined only once, and redefinitions indicate a mistake or typo.

--nowarn instr: no warnings about unusual instructions. Some instructions are not forbidden, but highly unusual. For example, indexed instructions can use R0 both as the index register and as the (default) operand, or a RES with an operand of zero.

--nowarn none: disable all warnings.

The --nowarn option can be specified multiple times, e.g. -W rel -W instr.

Note: warnings can be suppressed on a specific line by including the string “NOWARN" in uppercase as part of the comment.

–allow2650b

The 2650B variant of the processor adds LDPL and STPL instructions for directly loading or storing the lower Program Status Word to memory. The original 2650 does not support these instructions. If you wish to use these instructions, explicitly specify the --allow2650b option.

–segments

Each ORG pseudo-op starts a new code segment, but the output file is a plain sequence of bytes and can therefore only contain a single segment. The –segments option determines how asm2650 handles multiple segments.

–segments single: only a single segment can produce output. All other segments can only contain EQU definitions. To make this slightly more useful, RES pseudo-ops will not produce output unless some other instruction or pseudo-op created output before.

–segments padded: multiple segments are allowed, and the address space between them is filled with zeroes. The segments are not re-ordered; the starting address of a new segment must be higher in memory than the ending address of the previous segment.

In version 1 of asm2650 the default option was ‘single’. From version 2 onward the option defaults to ‘padded’.

–include

The INCLUDE keyword can be used to insert other input files. Use the --include option to define a directory in which asm2650 will look for included files. The option can be used multiple times, to specify multiple directories.

Any variables in the option will be expanded: $VAR or ${VAR} in Unix-like systems, or %VAR% in Windows; ~ or ~user will be replaced with the home directory of the current or named user respectively.

The default path is taken from the ASM2650INC environment variable. This variable can contain multiple directories, separated by a colon.

-l / -o: output options

Use -l to save the listing into the specified file. Use -o to save the generated code into the specified file. You can specify either or both. When neither -l nor -o is specified, the default is to display the listing on standard output and to not generated code. If either -l or -o is specified no output is displayed on standard output. However, you can use the special filename - to -l, which indicates standard output.

asm2650 myfile.2650
Listing on standard output, no output code (same as 'asm2650 -l - myfile.2650')

asm2650 -o myfile.bin myfile.2650
No listing, output code in myfile.bin

asm2650 -l - -o myfile.bin myfile.2650
Listing on standard output, output code in myfile.bin

-s: create sed-file

Some warnings can be fixed by the sed editor that is common on Unix-like systems. The sed instruction takes a file containing editing instructions, which is generated by the assembler when the -s options is used. Apply the sed-file as follows:

sed -f «sedfile» -i.bak «infile»

–hex: default to hexadecimal

By default numerical constants are written as decimal numbers, so that 10, 11, 12 mean ten, eleven, twelve. The option –hex changes this default to hexadecimal (“sixteen, seventeen, eighteen”). See Constants for details.

Input format

This section describes the valid input format recognised by the assembler. In general, each input line looks like this:

label    instruction  operand     comment

If the label is used, there must be no whitespace before it. If the label ends in a colon, the instruction and operand must be omitted. Instructions can either be an 2650 opcode one of the pseudo-ops. The comment is always optional. Any amount of whitespace can be used between the elements; instruction, operand and comment do not have to start at fixed positions.

Any whitespace at the end of lines is ignored. Blank lines are ignored but copied literally into the output listing.

Labels, instructions and operands are fully case-insensitive, so StartAddr and StartADDR refer to the same symbol. Case is preserved in the output listing.

There are some exceptions to this general format, such as the INCLUDE keyword or full-line comments. These will be explained in the sections below.

Basic elements

Symbols

Symbols (such as code labels or EQU definitions) consist of any sequence of letters, digits and the underscore _, except that their first character cannot be a digit. In the Signetics assemblers symbols are limited to 6 (uppercase) characters but there is no set limit to the length of symbols in asm2650, which greatly enhances code readability. Some symbols are predefined.

Valid symbols                 Invalid symbols
MyAddr                        2Label   (starts with a digit)
_Reference3                   Label#2  (invalid character)
This_2_is_a_symbol
Constants

Numbers consist of (hexa)decimal digits, depending on the default base. If the base is set to hexadecimal (using the DFLT pseudo-op, or the --hex command line option) then hexadecimal numbers must start with a digit 0-9 so as not to confuse them with a symbol. For example FF should be written as 0FF. Alternatively, hexadecimal numbers can be prefixed with a dollar-sign, e.g. $10. Negative numbers have a minus sign in front, and positive number may optionally start with a plus sign.

Based-numbers explicitly indicate their base, use the notation D'16' for decimal, O'137' for octal, B'1001011' for binary, or H'3F' for hexadecimal. In base-notation, you can specify a list of numbers, e.g. d'10,20,30' or h'0A,14,1E'. Lists are only supported in a few places, such as the DATA and ACON pseudo-ops.

The current address is represented by the symbol $. This is sometimes useful in address calculations. Note that the same symbol can be used to write a hexadecimal number.

Strings can be specified using either "my string" or A'my string'. With the first notation, a literal double-quote within the string must be escaped by doubling it. E.g. """" is the single double-quote character. The same applies vice versa for literal quotes in the second notation.

In the first notation, any ASCII character can be included by inserting \xBB, where BB is the hexadecimal value of the character. For example, a line ending can be specified as "Line end\x0D\x0A", or a C-style zero-terminated string as "Some text\x00". Because strings are lists, these examples can also be written as "Line end",$0D,$0A and "Some text",0.

EBCDIC strings are not supported. I trust you’re not disappointed.

Operators

Version 2 of asm2650 includes extensive support for operators. In order of precedence:

* and / and %multiplication, integer division, modulo
<< and >>shift left, shift right
+ and -addition, subtraction
&logical AND
| and ^logical OR, logical XOR
,list concatenation (comma)
< and >most significant byte, least significant byte

Most and least significant byte operators transform a two-byte value into a single-byte value. These operators are useful to load addresses into registers. Since asm2650 does not automatically truncate 16-bit values in places where an 8-bit value is needed, the > operator may be required: use lodi,r0 >Addr instead of lodi,r0 Addr.

    lodi,r2   <Buffer
    lodi,r3   >Buffer
    ; R2R3 now contain the address of the buffer

Multiplication is self-evident, but be aware that the asterisk * is also used to mark a full-line comment and to indicate indirect addressing. Division is an integer division, so 7 / 3 equals 2. To obtain the remainder after division (the modulo) use the % operator: 7 % 3 equals 1. When the second value is negative, results may not be what you expect: 7 % -3 equals -2, for example.

Addition and subtraction are self-evident.

All numerical operators multiplication, division, modulo, addition and subtraction work on single-character strings. The ASCII value of the character is used. E.g. “0“+1 equals “1“.

Shift left and shift right are bitwise operators that shift a number of steps left or right. E.g. h'40'>>2 equals h'10', and h'01'<<4 equals h'10'. Shift left is equivalent to multiplication by two; shift right is equivalent to division by two.

Logical AND with the & operator also operates on the bits of its operands. h'33' & h'1f' equals h'13'. Local OR and logical XOR (exclusive OR) are similar.

List concatenation is performed using a comma. Based-numbers and strings can already be lists. Add further elements to a list with the comma operator.

Len   EQU    5 
      DATA   h'10,20', Len, "abc"
results in:
10 20 05 61 62 63

Operator precedence is applies as in the table above. Use parentheses if necessary. The following are some examples of valid expressions.

$+2 
Load_Address-1 
Buffer+2*(Length-1)
Val<<2|b'11'

Whitespace (a tab or space) is not allowed within expressions. The one exception is that in lists there can be whitespace after a comma; this is to maintain compatibility with code produced by DASMx. Otherwise, the appearance of whitespace indicates the start of a comment.

Note: The notation used the TWIN assembler is not fully supported and may need rewriting; e.g. use & instead of .AND. and >> instead of .SHR. . The comparison operators and the pseudo-ops for conditional assembly are not supported by asm2650.

Predefined symbols

The following common symbols are predefined.

Program status word bits SENS, FLAG, II, IDC, RS, WC, OVF, LCOM, CAR

Registers R0, R1, R2, R3

Condition codes EQ, LT, and GT for comparisons, UN for unconditional branching, NE for bit tests, and Z, P, N for sign-comparisons (zero, positive, negative).

Labels

A label defines a new symbol for the current address value. There must be no whitespace to the left of the label; it must start at the left margin.

Full-line labels are followed by a colon. If there is whitespace and some text after the label, that text is taken as a comment. There cannot be an instruction after a full-line label; use a normal label for that. For example:

Fill_the_buffer:          This procedure fills the buffer with r0
      stra,r0  buff,r1 
      bdrr,r1  Fill_the_buffer

Normal labels are followed by an instruction or pseudo-op. Normal labels do not end in a colon.

Loop stra,r0  Buffer,r1   Fill the buffer with r0
     bdrr,r1  Loop

A common mistake is to reuse an existing label that was defined previously. asm2650 will generate a warning when a symbol is redefined.

Instructions

After the label there will be an instruction. If no label is used, there must be whitespace to the left of the instruction. The instruction must be a valid 2650 opcode or one of the special pseudo-ops.

For instructions that use Register addressing, two notations are in common use. The assembler accepts either:

     lodz,r1 
     lodz     r1           This is the same as the previous line

The second notation is not allowed for other addressing modes. All of the following are invalid:

     lodi     r1,5         Incorrect 
     lodr     r1,ADDR      Also incorrect 
     loda     r1,ADDR      Same

There can be no whitespace around the comma between the opcode and the register or condition code:

     lodz,r1               This is OK 
     lodz, r1              This raises an error

The BXA and BSXA instructions require that R3 is explicitly specified as the index register.

Invalid:
     bxa,un  Dest          cannot use a condition code
     bxa,r3  Dest          cannot use a register
     bxa     Dest,r2       index register must be r3
Valid:
     bxa     Dest,r3

The instructions LDPL and STPL are specific to the 2650B variant, and are only recognised when the --allow2650b command line argument has been specified.

Operands

Many opcodes and pseudo-ops require a further operand.

Indirect addressing is specified by using the asterisk * as the first character of the operand. With some assemblers the * character has to be in a specific character position on the line; asm2650 has no such restrictions. The operand can be any valid expression. For example:

     stra,r1  *Buffer+Len

Indexed addressing is specified by adding a register and (optionally) increment or decrement to the operand. Common notations are supported:

     stra,r0  Buffer,r1    Plain indexing 
     stra,r0  Buffer,r1+   Auto-increment 
     stra,r0  Buffer,r1,+  Auto-increment 
     stra,r0  Buffer,r1-   Auto-decrement 
     stra,r0  Buffer,r1,-  Auto-decrement

For opcodes that use Immediate addressing (e.g. LODI), the operand must be between -128 and +255. asm2650 will not silently take the least significant byte of two byte values; you must explicitly use the > operator for that. This helps in finding programming errors.

Comments

Full-line comments are indicated by an asterisk * or semicolon ; as the first non-whitespace character on the line. The entire line is ignored, and copied verbatim in the output listing. Note that it many assemblers the asterisk has to be the first character on the line. With asm2650 this is not required.

Otherwise, any text to the right of an instruction (separated by some whitespace) is taken as in instruction comment. (This is the reason why whitespace is generally not allowed within expressions).

* This is a full line comment
    * This is also a full line comment
    ; this too is a full line comment
LongLabel:        some documentation here
     lodi,r0  5   This is an instruction comment

The starting position of an instruction comment is not important, as long as there is some whitespace before it. The following line is therefore confusing but valid:

     lodz,r1  5   This error cannot be detected.

The lodz command does not take an operand, so the number 5 is taken to be a comment. If you actually meant lodi,r1 5 the assembler cannot warn you about this mistake. In the output listing comments are always preceded by a semicolon and aligned at column 73, to assist in identifying such mistakes.

Labels with a colon can also cause unexpected problems with comments:

Label:  loda,r1  Addr   Oops!

The entire text after Label: is take as a comment, so the loda instruction is ignored. Again, the assembler cannot warn you about this typo (but the output listing makes the error easy to identify).

If a comment contains the literal string NOWARN in uppercase, then all warnings on that line are suppressed. Use this to avoid cluttering the output with unnecessary warnings.

Include

Version 2 of asm2650 allows the input to be split over separate input files. Use the special keyword INCLUDE at the start of a line to name a file that will be read at that point. The contents of that file will be read and processed as if they had been inserted in the main file at the point of the INCLUDE. Afterwards, processing continues with the next line from the main file.

Includes can be nested: a file that is included can itself include another file. There is no set limit, but asm2650 does not detect circular references. If file A includes file B, and file B includes file A, an infinite loop will start.

* The main file
INCLUDE ../common.2650
INCLUDE datadefs.2650
  :
remainder of the main file here

The include file will be searched for in the following locations:

  1. First in the same directory as the main file (the file from which it was included).
  2. If it cannot be found there, the directories specified by the --include command line arguments are tried in order.
  3. If it cannot be found there either, the directories specified in the environment variable ASM2650INC are tried.

If the file cannot be located in any of these directories, an error message is printed and the assembler stops.

Pseudo-ops

Pseudo-ops do not generate code for the 2650; they are instructions to the assembler. The assembler supports all known pseudo-ops, although it ignores a number of them.

ORG

ORG (“origin”) sets the location counter. Typically this determines the memory address where the code will be located. Each input file should contain at least one ORG instruction. Without an ORG instruction the assembler assumes that the code starts at address zero. This is normally not what is intended, and will generate a warning message. You can specify ORG 0 to make the assumption explicit.

Start     ORG    Address+h'100'

The label is optional; if present its value will be the operand. The operand must be a single value (not a list).

Each ORG pseudo-op defines a new code segment. There is a default code segment before the first ORG pseudo-op. The command line argument --segments determines how the segments are combined into an output file.

EQU

EQU (“equals”) defines a new symbol. A normal label must be specified; a full-line label is not valid. The label becomes a new symbol, and the expression defines the value of that symbol. The operand must be a single value; it cannot be a list.

Symbol    EQU    expression

Forward references are possible: you can use a symbol in an expression and define that symbol further on using EQU.

ACON

ACON (“address constant”) defines two byte values that can be used as an address, counter, or any other purpose. The operand can be a single value or a list of values. Each item will be stored as two bytes in the output file. The value cannot be negative and must not exceed h'ffff'.

Label     ACON   h'0100', h'0100'+Len, MyAddr

The label is optional; if present it will be defined as the address of the first byte.

The pseudo-op DW is a synonym for ACON.

DATA

DATA defines single byte values that will be inserted in the output file. The operand can be a single value or a list of values. Each value must be between -127 and 255, so that it can fit in a single byte.

Label     DATA   h'10,20,30',0,AddrB-AddrA

The label is optional; if present it will be defined as the address of the first byte.

The pseudo-op DB is a synonym for DATA.

RES

RES reserves memory. In the Signetics assemblers the contents of the memory locations are not initialised, but asm2650 pre-fills these bytes to zero. The operand indicates the number of bytes to reserve; if it is zero a warning will be issued. It is an error for the operand to be negative.

Label     RES     Len+1

The label is optional; if present it will be defined as the address of the first byte.

Note that if output is set to ‘single segmentRES will only emit bytes into the output if some output had already been generated before.

DFLT

DFLT (“default base”) determines whether numerical constants are interpreted as decimal or hexadecimal. Values 0 or 10 indicate that numbers will be decimal; values 1 or 16 indicate that numbers will be hexadecimal. Note that this base does not apply to based-numbers such as h’20’. The default base cannot be set to 8; let me know if you think this is a bug.

The initial default base is decimal. The –hex command line option changes this to hexadecimal.

Other pseudo-ops

The pseudo-ops STARTENDEJE , PAG, PRTSPCTITL, and PCH are accepted but ignored. The pseudo-ops IF, ELSE and ENDIF in the TWIN assembler are not supported, as they cannot safely be ignored.

Code segments

The binary output file only contains the content bytes, in a single unstructured stream. It does not contain any reference to the intended starting address in memory. This also means that the binary output cannot contain multiple sections. The source file can contain multiple sections; each ORG pseudo-op starts a new code segment at a new memory address. A source file with two ORG instructions would contain three segmentcs. (Note that the start of the source file contains a segment at the implicit address 0000).

This leads to the question how the RES pseudo op should be treated. Consider the following code

; Scratch memory 
       ORG      H'17C0' 
COUNT  RES      1 
       ... 

; Code section
       ORG      H'1510' 
START  eorz,r0 
       ... 
BUF    RES      D'20' 
MORE   lodi,r1  5

Ideally the first section should only define the items in scratch memory but produce no output to the binary output file. The second section should produce output, including zeroes for the RES instruction.

The assembler addresses the issue in one of two possible ways, depending on the --segments command line argument.

Single. If the current section already produced output bytes then the RES will also produce zeroes. But if the current section did not produce any output (as in the definitions in the scratch section above) RES will not produce output bytes. In other words, RES will not be the first instruction in a section to produce output, but will produce output if it follows other output bytes.

With this option, there can be many segments but only one segment should produce output. The starting addresses of the segments can be in any order.

Padded. RES will always produce output, and if multiple segments are used they will all be emitted to the output code. Any space between segments will be filled with zeroes. However, segments must be in consecutive order.

Version 1 of asm2650 always used the ‘single’ option. Version 2 supports both options, but defaults to ‘padded’.

Compatibility

asm2650 should accept any valid input file without generating errors, although it may raise warnings. Please let me know (through the link at the bottom of this page) if you have a valid input file that is not processed properly by asm2650.

Some assemblers insert an implicit > operator when an 8-bit value is expected, e.g:

      lodi,r0  h'1234'
      ; on some assemblers r0 would now contain h'34'

asm2650 does not do this, and will raise an error message instead.


Version history

2.2.1 — Priority of << and >> is now above that of addition/subtraction.
Exit code is the number of errors (so zero on success only).
Do not duplicate number of errors and warnings when the listing is sent to the console.
Fix listing with long labels before an instruction.

2.2 — Added –hex option to default to (unprefixed) hexadecimal constants.
Include errors and warnings in listing; do not remove the listing when errors were encountered.
Use tabs instead of spaces in output listing. This reduced the size of the listing by 40%. Tabsize is 8.
Accept and ignore the PAG pseudo-op. (Alternative to EJE).

2.1 — Spaces are no longer accepted within expressions, to restore compatibility with valid input files with comments that start with an operator-character (such as * or a comma).
Precendence of < and > is reduced to lowest, for compatibility with previous versions and other assemblers.
The instruction lodz,r0 now gives a warning instead of an error. See the discussion on this pattern.
Alignment of comments in the output listing has been improved. Comments in a listing start with a semicolon for extra clarity.

2.0.4 — Ignore STX, ETX and Ctrl-Z characters in input. These characters are often used in the encoding of text files. (Thanks, Ron!)

Correct the warning when relative addressing could have been used.

Omit printing the symbol table when errors have been encountered.

2.0.3 — Removed deprecated Python syntax. Now works with Python 3.13.

2.0.2 — Fixed a bug with negative numbers.
Change the listing for EQU symbol definitions, to distinguish them clearly from labels (which are addresses).

2.0.1 — Fixed a bug that would produce incorrect code output with DATA or ACON pseudo-ops, when using certain calculations on forward references.

2.0.0 — Extensive rewriting and some breaking syntax changes. All for the best, I hope.
Insert and nest files using the INCLUDE keyword.
Expressions can use many new operators. Octal and binary constants are now possible too.
Two options for handling multiple code segments. Padding is the new default.

1.0.0 to 1.0.15 — Version one. The various updates corrected issues or added minor features.