This is version 2 of my assembler for the Signetics 2650 processor. It is based on the assembler published on https://binnie.id.au/MicroByte/ but has been extensively modified.
Some features:
- Helpful. It provides extensive error checking and warnings. Warnings can be disabled, on a line-by-line basis or per category. Makes sure that your produced output is absolutely correct.
- Free form. No rigid formatting of the source files. Use any kind of whitespace, symbols of any length, and mix upper and lower case. Use underscores in symbols if that helps you.
- Organised. Separate your code into separate files and combine them using the
INCLUDE
keyword. - Flexible. Use either the Signetics syntax or the style produced by the DASMx disassembler.
asm2650
supports both. - Complete. The extra instructions for the 2650B-processor are supported.
Download
asm2650
is written in Python; it should work with most Python3 installations.
The file is called asm2650
; you can rename it to asm2650.py
if your Python installation requires this.
Manual
Table of contents
- Usage
- –nowarn: disable warnings
- –allow2650b: enable 2650B-only instructions
- –segments: interpret ORG pseudo-ops
- –include: directories to search
- output options: produce listing and/or binary output
- create sed file: optimize your code
- –hex: default to hexadecimal
- Input format: the syntax of input files
- Basic elements
- Symbols: labels or symbolic names for constants
- Constants: numbers, based-numbers, current address, and strings.
- Operators: numerical and bit-wise calculations
- Predefined symbols: the most common ones are includes
- Labels: symbolic references to code addresses
- Instructions: 2650 opcodes
- Operands: notations and some pitfalls
- Comments: to document your code, and some pitfalls
- Include: organise your input files
- Pseudo-ops: control the assembler
- Basic elements
- Code segments
- Compatibility
- Version history
Usage
asm2650 [-h] [-W {rel,label,instr,none}] [-B] [--segments {single,padded}] [-Ss] [-Sp] [--include DIR] [-l LISTOUT] [-o CODEOUT]
[-s SEDOUT] [--debug]
infile
Process 2650 Assembler Code -- version 2.2.1
positional arguments:
infile input source code file
optional arguments:
-h, --help show this help message and exit
-W {rel,label,instr,none}, --nowarn {rel,label,instr,none}
disable specified warning
-B, --allow2650b enable instructions specific to 2650B-variant
--segments {single,padded}
how to handle multiple code segments (default=padded)
-Ss same as --segments single
-Sp same as --segments padded
--include DIR, -I DIR directory for INCLUDE
-l LISTOUT output text listing file
-o CODEOUT output binary code file
-s SEDOUT output sed instructions
-H, --hex unprefixed numerical constants are hexadecimal
--debug, -d enable debugging
–nowarn
Categories of warnings can be disabled using --nowarn
, as described below.
--nowarn rel
: no warning when a relative instruction could have been used instead of an absolute instructions. When the branch offset is small (between -64 and +63) the instruction could be replaced by a relative instruction. For example in:
LOOP BDRA,R1 LOOP
Using a relative instruction instead of an absolute instruction would save one byte in memory and is also executed faster by the 2650. Note that on Unix-like systems these corrections can be automatically applied using the -s
option (see below).
--nowarn label
: no warning when a label is redefined; the last definition of the label is used. Typically each label is defined only once, and redefinitions indicate a mistake or typo.
--nowarn instr
: no warnings about unusual instructions. Some instructions are not forbidden, but highly unusual. For example, indexed instructions can use R0 both as the index register and as the (default) operand, or a RES with an operand of zero.
--nowarn none
: disable all warnings.
The --nowarn
option can be specified multiple times, e.g. -W rel -W instr
.
Note: warnings can be suppressed on a specific line by including the string “NOWARN"
in uppercase as part of the comment.
–allow2650b
The 2650B variant of the processor adds LDPL and STPL instructions for directly loading or storing the lower Program Status Word to memory. The original 2650 does not support these instructions. If you wish to use these instructions, explicitly specify the --allow2650b
option.
–segments
Each ORG
pseudo-op starts a new code segment, but the output file is a plain sequence of bytes and can therefore only contain a single segment. The –segments option determines how asm2650
handles multiple segments.
–segments single: only a single segment can produce output. All other segments can only contain EQU
definitions. To make this slightly more useful, RES
pseudo-ops will not produce output unless some other instruction or pseudo-op created output before.
–segments padded: multiple segments are allowed, and the address space between them is filled with zeroes. The segments are not re-ordered; the starting address of a new segment must be higher in memory than the ending address of the previous segment.
In version 1 of asm2650 the default option was ‘single’. From version 2 onward the option defaults to ‘padded’.
–include
The INCLUDE
keyword can be used to insert other input files. Use the --include
option to define a directory in which asm2650
will look for included files. The option can be used multiple times, to specify multiple directories.
Any variables in the option will be expanded: $VAR
or ${VAR}
in Unix-like systems, or %VAR%
in Windows; ~
or ~user
will be replaced with the home directory of the current or named user respectively.
The default path is taken from the ASM2650INC
environment variable. This variable can contain multiple directories, separated by a colon.
-l / -o: output options
Use -l
to save the listing into the specified file. Use -o
to save the generated code into the specified file. You can specify either or both. When neither -l
nor -o
is specified, the default is to display the listing on standard output and to not generated code. If either -l
or -o
is specified no output is displayed on standard output. However, you can use the special filename -
to -l
, which indicates standard output.
asm2650 myfile.2650
Listing on standard output, no output code (same as 'asm2650 -l - myfile.2650')
asm2650 -o myfile.bin myfile.2650
No listing, output code in myfile.bin
asm2650 -l - -o myfile.bin myfile.2650
Listing on standard output, output code in myfile.bin
-s: create sed-file
Some warnings can be fixed by the sed editor that is common on Unix-like systems. The sed instruction takes a file containing editing instructions, which is generated by the assembler when the -s
options is used. Apply the sed-file as follows:
sed -f «sedfile» -i.bak «infile»
–hex: default to hexadecimal
By default numerical constants are written as decimal numbers, so that 10, 11, 12 mean ten, eleven, twelve. The option –hex changes this default to hexadecimal (“sixteen, seventeen, eighteen”). See Constants for details.
Input format
This section describes the valid input format recognised by the assembler. In general, each input line looks like this:
label instruction operand comment
If the label is used, there must be no whitespace before it. If the label ends in a colon, the instruction and operand must be omitted. Instructions can either be an 2650 opcode one of the pseudo-ops. The comment is always optional. Any amount of whitespace can be used between the elements; instruction, operand and comment do not have to start at fixed positions.
Any whitespace at the end of lines is ignored. Blank lines are ignored but copied literally into the output listing.
Labels, instructions and operands are fully case-insensitive, so StartAddr
and StartADDR
refer to the same symbol. Case is preserved in the output listing.
There are some exceptions to this general format, such as the INCLUDE
keyword or full-line comments. These will be explained in the sections below.
Basic elements
Symbols
Symbols (such as code labels or EQU
definitions) consist of any sequence of letters, digits and the underscore _
, except that their first character cannot be a digit. In the Signetics assemblers symbols are limited to 6 (uppercase) characters but there is no set limit to the length of symbols in asm2650
, which greatly enhances code readability. Some symbols are predefined.
Valid symbols Invalid symbols
MyAddr 2Label (starts with a digit)
_Reference3 Label#2 (invalid character)
This_2_is_a_symbol
Constants
Numbers consist of (hexa)decimal digits, depending on the default base. If the base is set to hexadecimal (using the DFLT
pseudo-op, or the --hex
command line option) then hexadecimal numbers must start with a digit 0-9 so as not to confuse them with a symbol. For example FF
should be written as 0FF
. Alternatively, hexadecimal numbers can be prefixed with a dollar-sign, e.g. $10
. Negative numbers have a minus sign in front, and positive number may optionally start with a plus sign.
Based-numbers explicitly indicate their base, use the notation D'16'
for decimal, O'137'
for octal, B'1001011'
for binary, or H'3F'
for hexadecimal. In base-notation, you can specify a list of numbers, e.g. d'10,20,30'
or h'0A,14,1E'
. Lists are only supported in a few places, such as the DATA
and ACON
pseudo-ops.
The current address is represented by the symbol $
. This is sometimes useful in address calculations. Note that the same symbol can be used to write a hexadecimal number.
Strings can be specified using either "my string"
or A'my string'
. With the first notation, a literal double-quote within the string must be escaped by doubling it. E.g. """"
is the single double-quote character. The same applies vice versa for literal quotes in the second notation.
In the first notation, any ASCII character can be included by inserting \xBB
, where BB
is the hexadecimal value of the character. For example, a line ending can be specified as "Line end\x0D\x0A"
, or a C-style zero-terminated string as "Some text\x00"
. Because strings are lists, these examples can also be written as "Line end",$0D,$0A
and "Some text",0
.
EBCDIC strings are not supported. I trust you’re not disappointed.
Operators
Version 2 of asm2650
includes extensive support for operators. In order of precedence:
* and / and % | multiplication, integer division, modulo |
<< and >> | shift left, shift right |
+ and - | addition, subtraction |
& | logical AND |
| and ^ | logical OR, logical XOR |
, | list concatenation (comma) |
< and > | most significant byte, least significant byte |
Most and least significant byte operators transform a two-byte value into a single-byte value. These operators are useful to load addresses into registers. Since asm2650
does not automatically truncate 16-bit values in places where an 8-bit value is needed, the > operator may be required: use lodi,r0 >Addr
instead of lodi,r0 Addr
.
lodi,r2 <Buffer
lodi,r3 >Buffer
; R2R3 now contain the address of the buffer
Multiplication is self-evident, but be aware that the asterisk *
is also used to mark a full-line comment and to indicate indirect addressing. Division is an integer division, so 7 / 3 equals 2. To obtain the remainder after division (the modulo) use the % operator: 7 % 3 equals 1. When the second value is negative, results may not be what you expect: 7 % -3 equals -2, for example.
Addition and subtraction are self-evident.
All numerical operators multiplication, division, modulo, addition and subtraction work on single-character strings. The ASCII value of the character is used. E.g. “0
“+1 equals “1
“.
Shift left and shift right are bitwise operators that shift a number of steps left or right. E.g. h'40'>>2
equals h'10'
, and h'01'<<4
equals h'10'
. Shift left is equivalent to multiplication by two; shift right is equivalent to division by two.
Logical AND with the & operator also operates on the bits of its operands. h'33' & h'1f'
equals h'13'
. Local OR and logical XOR (exclusive OR) are similar.
List concatenation is performed using a comma. Based-numbers and strings can already be lists. Add further elements to a list with the comma operator.
Len EQU 5
DATA h'10,20', Len, "abc"
results in:
10 20 05 61 62 63
Operator precedence is applies as in the table above. Use parentheses if necessary. The following are some examples of valid expressions.
$+2
Load_Address-1
Buffer+2*(Length-1)
Val<<2|b'11'
Whitespace (a tab or space) is not allowed within expressions. The one exception is that in lists there can be whitespace after a comma; this is to maintain compatibility with code produced by DASMx. Otherwise, the appearance of whitespace indicates the start of a comment.
Note: The notation used the TWIN assembler is not fully supported and may need rewriting; e.g. use &
instead of .AND.
and >>
instead of .SHR.
. The comparison operators and the pseudo-ops for conditional assembly are not supported by asm2650
.
Predefined symbols
The following common symbols are predefined.
Program status word bits SENS, FLAG, II, IDC, RS, WC, OVF, LCOM, CAR
Registers R0, R1, R2, R3
Condition codes EQ
, LT
, and GT
for comparisons, UN
for unconditional branching, NE
for bit tests, and Z
, P
, N
for sign-comparisons (zero, positive, negative).
Labels
A label defines a new symbol for the current address value. There must be no whitespace to the left of the label; it must start at the left margin.
Full-line labels are followed by a colon. If there is whitespace and some text after the label, that text is taken as a comment. There cannot be an instruction after a full-line label; use a normal label for that. For example:
Fill_the_buffer: This procedure fills the buffer with r0
stra,r0 buff,r1
bdrr,r1 Fill_the_buffer
Normal labels are followed by an instruction or pseudo-op. Normal labels do not end in a colon.
Loop stra,r0 Buffer,r1 Fill the buffer with r0
bdrr,r1 Loop
A common mistake is to reuse an existing label that was defined previously. asm2650
will generate a warning when a symbol is redefined.
Instructions
After the label there will be an instruction. If no label is used, there must be whitespace to the left of the instruction. The instruction must be a valid 2650 opcode or one of the special pseudo-ops.
For instructions that use Register addressing, two notations are in common use. The assembler accepts either:
lodz,r1
lodz r1 This is the same as the previous line
The second notation is not allowed for other addressing modes. All of the following are invalid:
lodi r1,5 Incorrect
lodr r1,ADDR Also incorrect
loda r1,ADDR Same
There can be no whitespace around the comma between the opcode and the register or condition code:
lodz,r1 This is OK
lodz, r1 This raises an error
The BXA and BSXA instructions require that R3 is explicitly specified as the index register.
Invalid:
bxa,un Dest cannot use a condition code
bxa,r3 Dest cannot use a register
bxa Dest,r2 index register must be r3
Valid:
bxa Dest,r3
The instructions LDPL and STPL are specific to the 2650B variant, and are only recognised when the --allow2650b
command line argument has been specified.
Operands
Many opcodes and pseudo-ops require a further operand.
Indirect addressing is specified by using the asterisk *
as the first character of the operand. With some assemblers the *
character has to be in a specific character position on the line; asm2650
has no such restrictions. The operand can be any valid expression. For example:
stra,r1 *Buffer+Len
Indexed addressing is specified by adding a register and (optionally) increment or decrement to the operand. Common notations are supported:
stra,r0 Buffer,r1 Plain indexing
stra,r0 Buffer,r1+ Auto-increment
stra,r0 Buffer,r1,+ Auto-increment
stra,r0 Buffer,r1- Auto-decrement
stra,r0 Buffer,r1,- Auto-decrement
For opcodes that use Immediate addressing (e.g. LODI
), the operand must be between -128 and +255. asm2650
will not silently take the least significant byte of two byte values; you must explicitly use the >
operator for that. This helps in finding programming errors.
Comments
Full-line comments are indicated by an asterisk *
or semicolon ;
as the first non-whitespace character on the line. The entire line is ignored, and copied verbatim in the output listing. Note that it many assemblers the asterisk has to be the first character on the line. With asm2650
this is not required.
Otherwise, any text to the right of an instruction (separated by some whitespace) is taken as in instruction comment. (This is the reason why whitespace is generally not allowed within expressions).
* This is a full line comment
* This is also a full line comment
; this too is a full line comment
LongLabel: some documentation here
lodi,r0 5 This is an instruction comment
The starting position of an instruction comment is not important, as long as there is some whitespace before it. The following line is therefore confusing but valid:
lodz,r1 5 This error cannot be detected.
The lodz
command does not take an operand, so the number 5 is taken to be a comment. If you actually meant lodi,r1 5
the assembler cannot warn you about this mistake. In the output listing comments are always preceded by a semicolon and aligned at column 73, to assist in identifying such mistakes.
Labels with a colon can also cause unexpected problems with comments:
Label: loda,r1 Addr Oops!
The entire text after Label:
is take as a comment, so the loda
instruction is ignored. Again, the assembler cannot warn you about this typo (but the output listing makes the error easy to identify).
If a comment contains the literal string NOWARN
in uppercase, then all warnings on that line are suppressed. Use this to avoid cluttering the output with unnecessary warnings.
Include
Version 2 of asm2650 allows the input to be split over separate input files. Use the special keyword INCLUDE
at the start of a line to name a file that will be read at that point. The contents of that file will be read and processed as if they had been inserted in the main file at the point of the INCLUDE
. Afterwards, processing continues with the next line from the main file.
Includes can be nested: a file that is included can itself include another file. There is no set limit, but asm2650
does not detect circular references. If file A includes file B, and file B includes file A, an infinite loop will start.
* The main file
INCLUDE ../common.2650
INCLUDE datadefs.2650
:
remainder of the main file here
The include file will be searched for in the following locations:
- First in the same directory as the main file (the file from which it was included).
- If it cannot be found there, the directories specified by the
--include
command line arguments are tried in order. - If it cannot be found there either, the directories specified in the environment variable
ASM2650INC
are tried.
If the file cannot be located in any of these directories, an error message is printed and the assembler stops.
Pseudo-ops
Pseudo-ops do not generate code for the 2650; they are instructions to the assembler. The assembler supports all known pseudo-ops, although it ignores a number of them.
ORG
ORG
(“origin”) sets the location counter. Typically this determines the memory address where the code will be located. Each input file should contain at least one ORG instruction. Without an ORG instruction the assembler assumes that the code starts at address zero. This is normally not what is intended, and will generate a warning message. You can specify ORG 0
to make the assumption explicit.
Start ORG Address+h'100'
The label is optional; if present its value will be the operand. The operand must be a single value (not a list).
Each ORG pseudo-op defines a new code segment. There is a default code segment before the first ORG pseudo-op. The command line argument --segments
determines how the segments are combined into an output file.
EQU
EQU
(“equals”) defines a new symbol. A normal label must be specified; a full-line label is not valid. The label becomes a new symbol, and the expression defines the value of that symbol. The operand must be a single value; it cannot be a list.
Symbol EQU expression
Forward references are possible: you can use a symbol in an expression and define that symbol further on using EQU.
ACON
ACON
(“address constant”) defines two byte values that can be used as an address, counter, or any other purpose. The operand can be a single value or a list of values. Each item will be stored as two bytes in the output file. The value cannot be negative and must not exceed h'ffff'
.
Label ACON h'0100', h'0100'+Len, MyAddr
The label is optional; if present it will be defined as the address of the first byte.
The pseudo-op DW
is a synonym for ACON
.
DATA
DATA defines single byte values that will be inserted in the output file. The operand can be a single value or a list of values. Each value must be between -127 and 255, so that it can fit in a single byte.
Label DATA h'10,20,30',0,AddrB-AddrA
The label is optional; if present it will be defined as the address of the first byte.
The pseudo-op DB
is a synonym for DATA
.
RES
RES
reserves memory. In the Signetics assemblers the contents of the memory locations are not initialised, but asm2650
pre-fills these bytes to zero. The operand indicates the number of bytes to reserve; if it is zero a warning will be issued. It is an error for the operand to be negative.
Label RES Len+1
The label is optional; if present it will be defined as the address of the first byte.
Note that if output is set to ‘single segment‘ RES
will only emit bytes into the output if some output had already been generated before.
DFLT
DFLT
(“default base”) determines whether numerical constants are interpreted as decimal or hexadecimal. Values 0 or 10 indicate that numbers will be decimal; values 1 or 16 indicate that numbers will be hexadecimal. Note that this base does not apply to based-numbers such as h’20’. The default base cannot be set to 8; let me know if you think this is a bug.
The initial default base is decimal. The –hex command line option changes this to hexadecimal.
Other pseudo-ops
The pseudo-ops START
, END
, EJE
, PAG
, PRT
, SPC
, TITL
, and PCH
are accepted but ignored. The pseudo-ops IF
, ELSE
and ENDIF
in the TWIN assembler are not supported, as they cannot safely be ignored.
Code segments
The binary output file only contains the content bytes, in a single unstructured stream. It does not contain any reference to the intended starting address in memory. This also means that the binary output cannot contain multiple sections. The source file can contain multiple sections; each ORG
pseudo-op starts a new code segment at a new memory address. A source file with two ORG
instructions would contain three segmentcs. (Note that the start of the source file contains a segment at the implicit address 0000).
This leads to the question how the RES
pseudo op should be treated. Consider the following code
; Scratch memory
ORG H'17C0'
COUNT RES 1
...
; Code section
ORG H'1510'
START eorz,r0
...
BUF RES D'20'
MORE lodi,r1 5
Ideally the first section should only define the items in scratch memory but produce no output to the binary output file. The second section should produce output, including zeroes for the RES
instruction.
The assembler addresses the issue in one of two possible ways, depending on the --segments
command line argument.
Single. If the current section already produced output bytes then the RES
will also produce zeroes. But if the current section did not produce any output (as in the definitions in the scratch section above) RES
will not produce output bytes. In other words, RES
will not be the first instruction in a section to produce output, but will produce output if it follows other output bytes.
With this option, there can be many segments but only one segment should produce output. The starting addresses of the segments can be in any order.
Padded. RES
will always produce output, and if multiple segments are used they will all be emitted to the output code. Any space between segments will be filled with zeroes. However, segments must be in consecutive order.
Version 1 of asm2650
always used the ‘single’ option. Version 2 supports both options, but defaults to ‘padded’.
Compatibility
asm2650
should accept any valid input file without generating errors, although it may raise warnings. Please let me know (through the link at the bottom of this page) if you have a valid input file that is not processed properly by asm2650
.
Some assemblers insert an implicit >
operator when an 8-bit value is expected, e.g:
lodi,r0 h'1234'
; on some assemblers r0 would now contain h'34'
asm2650
does not do this, and will raise an error message instead.
Version history
2.2.1 — Priority of << and >> is now above that of addition/subtraction.
Exit code is the number of errors (so zero on success only).
Do not duplicate number of errors and warnings when the listing is sent to the console.
Fix listing with long labels before an instruction.
2.2 — Added –hex option to default to (unprefixed) hexadecimal constants.
Include errors and warnings in listing; do not remove the listing when errors were encountered.
Use tabs instead of spaces in output listing. This reduced the size of the listing by 40%. Tabsize is 8.
Accept and ignore the PAG pseudo-op. (Alternative to EJE).
2.1 — Spaces are no longer accepted within expressions, to restore compatibility with valid input files with comments that start with an operator-character (such as * or a comma).
Precendence of < and > is reduced to lowest, for compatibility with previous versions and other assemblers.
The instruction lodz,r0
now gives a warning instead of an error. See the discussion on this pattern.
Alignment of comments in the output listing has been improved. Comments in a listing start with a semicolon for extra clarity.
2.0.4 — Ignore STX, ETX and Ctrl-Z characters in input. These characters are often used in the encoding of text files. (Thanks, Ron!)
Correct the warning when relative addressing could have been used.
Omit printing the symbol table when errors have been encountered.
2.0.3 — Removed deprecated Python syntax. Now works with Python 3.13.
2.0.2 — Fixed a bug with negative numbers.
Change the listing for EQU symbol definitions, to distinguish them clearly from labels (which are addresses).
2.0.1 — Fixed a bug that would produce incorrect code output with DATA
or ACON
pseudo-ops, when using certain calculations on forward references.
2.0.0 — Extensive rewriting and some breaking syntax changes. All for the best, I hope.
Insert and nest files using the INCLUDE keyword.
Expressions can use many new operators. Octal and binary constants are now possible too.
Two options for handling multiple code segments. Padding is the new default.
1.0.0 to 1.0.15 — Version one. The various updates corrected issues or added minor features.