This is an assembler for the Signetics 2650 processor and its variants. It is based on the assembler published on https://binnie.id.au/MicroByte/ but includes a number of improvements:
- it provides extensive error checking and warnings;
- it allows any kind of whitespace and upper and lower case;
- syntax has been extended to provide for various formats in use;
- the extra instructions for the 2650B-processor are supported.
Download
Like the original assembler from MicroByte asm2650
is written in Python; it should work with most Python installations. Unlike the MicroByte assembler, asm2650
is contained in a single Python file.
The file is called asm2650
; you can rename it to asm2650.py
if your Python installation requires this.
Usage
asm2650 [-h] [-W {rel,label,instr,none}] [-B] [-l «listout»] [-o «codeout»] [-s «sedfile»] «infile»
Process 2650 Assembler Code -- version 1.0.15
positional arguments:
«infile» input source code file
optional arguments:
-h, --help show this help message and exit
-W, --nowarn {rel,label,instr,none} disable specified warning
-B, --allow2650b enable instructions specific to 2650B-variant
-l «listout» output text listing file
-o «codeout» output binary code file
-s «sedfile» output sed instructions
–nowarn
Warnings can be disabled using --nowarn
:
rel
: no warning when a relative instruction could have been used instead of an absolute instructions. When the offset is small (between -64 and +63) the instruction could be replaced by a relative instruction. For example:
LOOP BDRA,R1 LOOP
Using a relative instruction instead of an absolute instruction would save one byte in memory and is also executed faster by the 2650. Note that on Unix-like systems these corrections can be automatically applied using the -s
option (see below).
label
: no warning when a label is redefined; the last definition of the label is used. Typically each label is defined only once, and redefinitions indicate a mistake or typo.
instr
: no warnings about unusual instructions. Some instructions are not forbidden, but highly unusual. For example, indexed instructions can use R0 both as the index register and as the (default) operand, or a RES with an operand of zero. This is not typically what you want.
none
: disable all warnings.
The --nowarn
option can be specified multiple times, e.g. -W rel -W instr
.
Warnings can be suppressed on a specific line by including the string *NOWARN
as part of the comment.
–allow2650b
The 2650B variant of the processor adds LDPL and STPL instructions for directly loading or storing the lower Program Status Word to memory. The original 2650 does not support these instructions. If you wish to use these instructions, explicitly specify the --allow2650b
option.
-l / -o: output options
Without any output options specified the assembler will generate a listing on standard output. If either -l
or -o
is specified no output is shown on standard output. Use -l
to save a listing (code together with the original input file). Use -o
to save the assembled instructions and data into a binary file. You can use both.
-s: create sed-file
Some warnings can be fixed by the sed editor that is common on Unix-like systems. The sed instruction takes a file containing editing instructions, which is generated by the assembler when the -s
options is used. Apply the sed-file as follows:
sed -f «sedfile» -i.bak «infile»
Code format
This section describes the valid input format recognised by the assembler.
Pseudo-ops
Pseudo-ops do not generate code for the 2650; they are instructions to the assembler. The assembler supports all known pseudo-ops, although it ignores a number of them. See the Signetics 2650 manual for a more detailed description of assembler pseudo-ops.
ORG
sets the location counter. Typically this determines the memory address where the code will be located.Each input file should contain at least one ORG instruction. Without an ORG instruction the assembler assumes that the code starts at address 0000. This is normally not what is intended, and will generate a warning message. You should specifyORG 0000
to make the assumption explicit.EQU
defines a new symbol.ACON
orDW
defines two bytes that contain an address value.DATA
orDB
defines one or more bytes that contain the specified values.RES
reserves memory. In the output file these bytes are set to zero.DFLT
determines whether numerical constants are interpreted as decimal or hexadecimal. Values 0 or 10 indicate that numbers will be decimal; values 1 or 16 indicate that numbers will be hexadecimal. Note that you can always explicitly specify the base; e.g. sixteen can be written as D’16’ or H’10’.
The pseudo-ops START
, END
, EJE
, PRT
, SPC
, TITL
, and PCH
are accepted but ignored.
The RES pseudo-op
The binary output file only contains the content bytes. It does not contain any reference to the intended starting address in memory. This also means that the binary output cannot contain multiple sections. The source assembler file can contain multiple sections; each ORG
instruction starts a new section at a new memory address. A source file with two ORG
instructions would contain three sections. (Note that the start of the source file contains a section at the implicit address 0000).
This leads to the question how the RES
pseudo op should be treated. Consider the following code
; Scratch memory
ORG H'17C0'
COUNT RES 1
...
; Code section
ORG H'1510'
START eorz,r0
...
BUF RES D'20'
MORE lodi,r1 5
Ideally the first section should only define the items in scratch memory but produce no output to the binary output file. The second section should produce output, including zeroes for the RES
instruction.
The assembler addresses the issue in this way: if the current section already produced output bytes then the RES
will also produce zeroes. But if the current section did not produce any output (as in the definitions in the scratch section above) RES
will not produce output bytes.
In other words, RES
will not be the first instruction in a section to produce output, but will produce output if it follows other output bytes.
Expressions
Labels and Symbols must start with a letter and may contain letters, digits and the underscore _
. Labels are case-insensitive, so StartAddr
and StartADDR
refer to the same symbol. Case is preserved in the output.
Numbers consist of (hexa)decimal digits, depending on the default base. If the base is set to hexadecimal (using the DFLT
pseudo-op) then hexadecimal numbers must start with a digit 0-9 so as not to confuse them with a symbol. For example FF
should be written as 0FF
. To explicitly indicate the base, use the notation D'16'
or H'10'
(or d'16'
, h'10'
in lowercase). Alternatively, hexadecimal numbers can be prefixed with a dollar-sign, e.g. $10
.
Negative numbers have a minus sign in front, and positive number may optionally start with a plus sign.
Octal or binary constants are not supported.
The high byte of a two-byte value is obtained by <Label
; the low byte of a two-byte value is obtained by >Label
. It is an error to use a two byte value for immediate commands: use lodi,r0 >Addr
instead of lodi,r0 Addr
. Other assemblers will silently discard the high byte, but asm2650 flags this as en error to prevent errors. Note that the <
and >
operator have higher priority than addition or subtraction, so you may want to use <Addr+h'100'
instead of <Addr+1
to obtain the next block after Addr
.
ASCII text strings are specified using A'Some text here'
or "Some text here"
. When a string is specified using double quotes, the expressions \xNN
(where N indicates any hexadecimal digit) are replaced with the character with ascii code NN
. For example, a line ending can be specified as "Line end\x0D\x0A"
, or a C-style zero-terminated string as "Some text\x00"
.
Text using the EBCDIC encoding is not supported.
The symbol $
refers to the current location (not to be confused with $10
hexadecimal numbers).
Expressions can include addition and subtraction, e.g.:
$+2
LoadAddress-1
Buffer+Length-1
Note that whitespace is not allowed inside expressions! The following will (silently) yield incorrect results
LoadAddres - 1 does not work
Buffer + Length - 1 also does not work
Many instructions accept a list of expressions. List items are separated by a comma, optionally followed by whitespace.
Comments
Full-line comments are indicated by an asterisk *
or semicolon ;
as the first character on the line. Otherwise, any valid instruction can be followed by one or more spaces or tabs; the remainder of the line is then taken to be a comment. E.g.:
; This is a full line comment
* This, too.
lodi,r0 5 This is a comment as well
The starting position of the comment does not matter. The following line is therefore confusing but valid:
lodz,r1 5 This error cannot be detected.
The lodz
command does not take an operand, so the number 5 is taken to be a comment. If you actually meant lodi,r1 5
the assembler cannot warn you about this typo.
Program Code Variants
There are two variants: as used by Signetics, and as produced by the DASMx disassembler from Conquest Consultants. See:
- The Signetics 2650 CPU manual (for example https://frank.pocnet.net/other/sos/Philips_2650/Philips_(Signetics)_2650.pdf
- https://www.oocities.org/pclareuk/DASMx/
This assembler supports both variants, but with some improvements:
- Upper and lower case may be used. The assembler is case-insensitive but case-preserving.
- Labels can be any length, and may contain underscores ‘_’.
- Constants can only be specified in decimal, hexadecimal or ASCII format. Octal, binary and EBCDIC constants are not supported.
- Forward references may be used anywhere, including register codes, condition codes and assembler pseudo-ops.
Whitespace at the end of lines is ignored.
Blank lines and full-line comments are copied into the output listing.
A line that starts with a symbol followed by a colon defines a new label. For example:
Fill_the_buffer:
stra,r0 buff,r1
bdrr,r1 Fill_the_buffer
Otherwise labels start at the beginning of the line and are followed by an instruction or pseudo-op:
Loop stra,r0 Buffer,r1
bdrr,r1 Loop
Indirect addressing is specified by using the asterisk *
as the first character of the operand. With some assemblers the *
character has to be in a specific character position on the line; asm2650
has no such restrictions. The operand can be any valid expression. For example:
stra,r1 *Buffer+Len
Indexed addressing is specified by adding a register and (optionally) increment or decrement to the operand. Common notations are supported:
stra,r0 Buffer,r1 Plain indexing
stra,r0 Buffer,r1+ Auto-increment
stra,r0 Buffer,r1,+ Auto-increment
stra,r0 Buffer,r1- Auto-decrement
stra,r0 Buffer,r1,- Auto-decrement
For instructions that use Register addressing, two notations are in common use. The assembler accepts either:
lodz,r1
lodz r1 This is the same as the previous line
The second notation is not allowed for other addressing modes. All of the following are invalid:
lodi r1,5 Incorrect
lodr r1,ADDR Also incorrect
loda r1,ADDR Same
Note that there can be no whitespace around the comma between the opcode and the register or condition code:
lodz,r1 This is OK
lodz, r1 This raises an error
Predefined symbols
The following common symbols are predefined.
- Registers
R0, R1, R2, R3
- Condition codes
EQ, LT, GT, UN, NE
andZ, P, N
- Program status word bits
SENS, FLAG, II, IDC, RS, WC, OVF, LCOM, CAR
Version history
1.0.0 — First release.
1.0.1 — The RES
pseudo-op no longer generates output in the binary file. Previously bytes 00
were written. If you want this, use DATA
or DB
instead of RES
. It is now OK to use multiple ORG sections, as long as only the first one actually generates binary code. Multiple sections that actually contain data will still result in a warning, which can be suppressed using --nowarn org
1.0.2 — Add an error for LODZ,R0: according to the Signetics manual the results of this instruction are undefined.
1.0.3 — Better handling of large syntax errors leading to exceptions in Python.
1.0.4 — A number of changes to support more input formats: The dollar-sign can be used to indicate a hexadecimal number (e.g. $0d
).
— The START
pseudo-op is accepted, but ignored.
— Also, full-line comments can now start with whitespace. The first non-space character must still be a asterisk *
or semicolon ;
, but there can be whitespace at the start of the line.
— Input files are always read as UTF8 files. Incorrect UTF8 codes would crash the assembler; this is now fixed.
1.0.5 — Labels starting with letter R followed by a number would be incorrectly interpreted as index registers. For example, label R012 would be interpreted as R0 followed by the number 12.
— When errors are encountered the output files will be deleted (as they will contain invalid data). This used to be a problem when they were special files such as /dev/null. The following is now possible:
asm2650 -l /dev/null Myfile.2650
1.0.6 — Fixed: using $ to refer to the current code address stopped working since 1.0.4.
1.0.7 — Changed the meaning of the RES
pseudo again. See the section ‘Pseudo-ops’.
1.0.8 — Pseudo-ops that are ignored no longer require an argument.
— The listing now shows a list of all symbols in alphabetical order. Thanks Neill!
1.0.9 — Single-letter strings are now correctly converted to their ASCII-equivalent for EQU.
— More useful error message when a line contains only a label and nothing else. Thanks again Neill!
— When ignoring a pseudo line END, do remember its label.
— Added several warnings for unusual instructions.
— Also removed the org
warning, as it did not make sense anymore.
1.0.10 — Fixed: a DATA pseudo containing forward references to labels would lead to incorrect output.
1.0.11 — Added an error message when attempting to use absolute addressing across page boundaries.
1.0.12 — Added option to disable warnings on a specific line using the *NOWARN
comment.
1.0.13 — Added the -s option to automatically fix certain warnings.
1.0.14 — Fixed a regression with the BXA instruction. It always requires an index register which must be R3, and it does not take a condition code or register.
Invalid:
bxa,un Dest ; cannot use a condition code
bxa,r3 Dest ; cannot use a register
bxa Dest,r2 ; index register must be r3
Valid:
bxa Dest,r3
1.0.15 — Allow *NOWARN
anywhere in the comment, not just at the start.
Show a warning when using a double quote within an ASCII string, to prevent confusion between comments and string contents.
Show an error when a label contains invalid characters, or does not start with a letter or underscore.