This is an assembler for the Signetics 2650 processor and its variants. It is based on the assembler published on https://binnie.id.au/MicroByte/ but includes a number of improvements. In short:
it provides extensive error checking and warnings;
it allows any kind of whitespace and upper and lower case;
syntax has been extended to provide for various formats in use;
the extra instructions for the 2650B-processor are supported.
Like the original assembler from MicroByte asm2650
is
written in Python; it should work with most Python installations.
Unlike the MicroByte assembler, asm2650
is contained in a
single Python file. Download the assembler here
.
The file is called asm2650
; you can rename it to
asm2650.py
if your Python installation requires this.
asm2650 [-h] [-W {rel,label,instr,none}] [-B] [-l «listout»] [-o «codeout»] «infile» Process 2650 Assembler Code -- version 1.0.9 positional arguments: «infile» input source code file optional arguments: -h, --help show this help message and exit -W, --nowarn {rel,label,instr,none} disable specified warning -B, --allow2650b enable instructions specific to 2650B-variant -l «listout» output text listing file -o «codeout» output binary code file
Warnings can be disabled using --nowarn
:
rel
: no warning when a relative instruction could have
been used instead of an absolute instructions. When the offset is small
(between -64 and +63) the instruction could be replaced by a relative
instruction. For example:
LOOP BDRA,R1 LOOP
Using a relative instruction instead of an absolute instruction would save one byte in memory and is also executed faster by the 2650.
label
: no warning when a label is redefined; the last
definition of the label is used. Typically each label is defined only
once, and redefinitions indicate a mistake or typo.
instr
: no warnings about unusual instructions. Some
instructions are not forbidden, but highly unusual. For example,
indexed instructions can use R0 both as the index register and as the
(default) operand, or a RES with an operand of zero. This is not
typically what you want.
none
: disable all warnings.
The --nowarn
option can be specified multiple times,
e.g. -W rel -W instr
.
The 2650B variant of the processor adds two instructions for directly
loading or storing the lower Program Status Word to memory. The
original 2650 does not support these instructions. If you wish to use
these instructions, explicitly specify the --allow2650b
option.
Without any output options specified the assembler will generate a
listing on standard output. If either -l
or -o
is specified no output is shown on standard output. Use -l
to save a listing (code together with the original input file). Use
-o
to save the assembled instructions and data into a binary
file. You can use both.
This section describes the valid input format recognised by the assembler.
Pseudo-ops do not generate code for the 2650; they are instructions to the assembler. The assembler supports all known pseudo-ops, although it ignores a number of them. See the Signetics 2650 manual for a more detailed description of assembler pseudo-ops.
ORG
sets the location counter. Typically this determines
the memory address where the code will be located.
Each input file should contain at least one ORG instruction. Without
an ORG instruction the assembler assumes that the code starts at
address 0000. This is normally not what is intended, and will generate
a warning message. You should specify ORG 0000
to make the
assumption explicit.
EQU
defines a new symbol.
ACON
or DW
defines two bytes that contain
an address value.
DATA
or DB
defines one or more bytes that
contain the specified values.
RES
reserves memory. In the output file these bytes are
set to zero.
DFLT
determines whether numerical constants are
interpreted as decimal or hexadecimal. Values 0 or 10 indicate that
numbers will be decimal; values 1 or 16 indicate that numbers will be
hexadecimal. Note that you can always explicitly specify the base; e.g.
sixteen can be written as D'16' or H'10'.
The pseudo-ops START
, END
, EJE
, PRT
, SPC
, TITL
, and PCH
are accepted but ignored.
The binary output file only contains the content bytes. It does not
contain any reference to the intended starting address in memory. This
also means that the binary output cannot contain multiple sections. The
source assembler file can contain multiple sections; each
ORG
instruction starts a new section at a new memory address. A
source file with two ORG
instructions would contain three
sections. (Note that the start of the source file contains a section at
the implicit address 0000).
This leads to the question how the RES
pseudo op should
be treated. Consider the following code
; Scratch memory ORG H'17C0' COUNT RES 1 ... ; Code section ORG H'1510' START eorz,r0 ... BUF RES D'20' MORE lodi,r1 5
Ideally the first section should only define the items in scratch
memory but produce no output to the binary output file. The second
section should produce output, including zeroes for the
RES
instruction.
The assembler addresses the issue in this way: if the current section
already produced output bytes then the RES
will also
produce zeroes. But if the current section did not produce any output
(as in the definitions in the scratch section above) RES
will not produce output bytes.
In other words, RES
will not be the first instruction in
a section to produce output, but will produce output if it follows
other output bytes.
Labels and Symbols must start with a letter and may contain letters,
digits and the underscore _
. Labels are case-insensitive,
so StartAddr
and StartADDR
refer to the same
symbol. Case is preserved in the output.
Numbers consist of (hexa)decimal digits, depending on the default
base. If the base is set to hexadecimal (using the DFLT
pseudo-op) then hexadecimal numbers must start with a digit 0-9 so as
not to confuse them with a symbol. For example FF
should
be written as 0FF
. To explicitly indicate the base, use
the notation D'16'
or H'10'
(or d'16'
, h'10'
in lowercase). Alternatively, hexadecimal number
can be prefixed with a dollar-sign, e.g. $10
.
Negative numbers have a minus sign in front, and positive number may optionally start with a plus sign.
Octal or binary constants are not supported.
ASCII text strings are specified using A'Some text here'
or "Some text here"
. When a string is specified using
double quotes, the expressions \xNN
(where N indicates any
hexadecimal digit) are replaced with the character with ascii code
NN
. For example, a line ending can be specified as
"Line end\x0D\x0A"
, or a C-style zero-terminated string as
"Some text\x00"
.
Text using the EBCDIC encoding is not supported.
The symbol $
refers to the current location (not to be
confused with $10
hexadecimal numbers).
Expressions can include addition and subtraction, e.g.:
$+2 LoadAddress-1 Buffer+Length-1
Note that whitespace is not allowed inside expressions! The following will (silently) yield incorrect results
LoadAddres - 1 does not work Buffer + Length - 1 also does not work
Many instructions accept a list of expressions. List items are separated by a comma, optionally followed by whitespace.
Full-line comments are indicated by an asterisk *
or
semicolon ;
as the first character on the line. Otherwise,
any valid instruction can be followed by one or more spaces or tabs;
the remainder of the line is then taken to be a comment. E.g.:
; This is a full line comment * This, too. lodi,r0 5 This is a comment as well
The starting position of the comment does not matter. The following line is therefore confusing but valid:
lodz,r1 5
The lodz
command does not take an operand, so the number
5 is taken to be a comment. If you actually meant lodi,r1 5
the assembler cannot warn you about this typo.
There are two variants: as used by Signetics, and as produced by the DASMx disassembler from Conquest Consultants. See:
The Signetics 2650 CPU manual (for example https://frank.pocnet.net/other/sos/Philips_2650/Philips_(Signetics)_2650.pdf
This assembler supports both variants, but with some improvements:
Upper and lower case may be used. The assembler is case-insensitive but case-preserving.
Labels can be any length, and may contain underscores '_'.
Constants can only be specified in decimal, hexadecimal or ASCII format. Octal, binary and EBCDIC constants are not supported.
Forward references may be used anywhere, including register codes, condition codes and assembler pseudo-ops.
Whitespace at the end of lines is ignored.
Blank lines and full-line comments are copied into the output listing.
A line that starts with a symbol followed by a colon defines a new label. For example:
Fill_the_buffer: stra,r0 buff,r1 bdrr,r1 Fill_the_buffer
Otherwise labels start at the beginning of the line and are followed by an instruction or pseudo-op:
Loop stra,r0 Buffer,r1 bdrr,r1 Loop
Indirect addressing is specified by using the asterisk *
as the first character of the operand. With some assemblers the *
character has to be in a specific character position on the line;
asm2650
has no such restrictions. The operand can be any valid
expression. For example:
stra,r1 *Buffer+Len
Indexed addressing is specified by adding a register and (optionally) increment/decrement to the operand. Common notations are supported:
stra,r0 Buffer,r1 Plain indexing stra,r0 Buffer,r1+ Auto-increment stra,r0 Buffer,r1,+ Auto-increment stra,r0 Buffer,r1- Auto-decrement stra,r0 Buffer,r1,- Auto-decrement
For instructions that use Register addressing, two notations are in common use. The assembler accepts either:
lodz,r1 lodz r1 This is the same as the previous line
The second notation is not allowed for other addressing modes. All of the following are invalid:
lodi r1,5 Incorrect lodr r1,ADDR Also incorrect loda r1,ADDR Same
Note that there can be no whitespace around the comma between the opcode and the register or condition code:
lodz,r1 This is OK lodz, r1 This raises an error
The following common symbols are predefined.
Registers R0, R1, R2, R3
Condition codes EQ, LT, GT, UN, NE
and Z, P, N
Program status word bits
SENS, FLAG, II, IDC, RS, WC, OVF, LCOM, CAR
1.0.0 -- First release.
1.0.1 -- The RES
pseudo-op no longer
generates output in the binary file. Previously bytes 00
were written. If you want this, use DATA
or DB
instead of RES
. It is now OK to use multiple ORG sections,
as long as only the first one actually generates binary code. Multiple
sections that actually contain data will still result in a warning,
which can be suppressed using --nowarn org
1.0.2 -- Add an error for LODZ,R0: according to the Signetics manual the results of this instruction are undefined.
1.0.3 -- Better handling of large syntax errors leading to exceptions in Python.
1.0.4 -- A number of changes to support more input
formats: The dollar-sign can be used to indicate a hexadecimal number
(e.g. $0d
).
-- The START
pseudo-op is accepted, but ignored.
-- Also, full-line comments can now start with whitespace. The first
non-space character must still be a asterisk *
or
semicolon ;
, but there can be whitespace at the start of
the line.
-- Input files are always read as UTF8 files. Incorrect UTF8 codes would crash the assembler; this is now fixed.
1.0.5 -- Labels starting with letter R followed by a number would be incorrectly interpreted as index registers. For example, label R012 would be interpreted as R0 followed by the number 12.
-- When errors are encountered the output files will be deleted (as
they will contain invalid data). This used to be a problem when they
were special files such as /dev/null. The following is now possible:
asm2560 -l /dev/null Myfile.2650
1.0.6 -- Fixed: using $ to refer to the current code address stopped working since 1.0.4.
1.0.7 -- Changed the meaning of the RES
pseudo again. See the section 'Pseudo-ops'.
1.0.8 -- Pseudo-ops that are ignored no longer require an argument.
The listing now shows a list of all symbols in alphabetical order. Thanks Neill!
1.0.9 -- Single-letter strings are now correctly converted to their ASCII-equivalent for EQU.
-- More useful error message when a line contains only a label and nothing else. Thanks again Neill!
-- When ignoring a pseudo line END, do remember its label.
-- Added several warnings for unusual instructions.
-- Also removed the org
warning, as it did not make
sense anymore.
Eelco Vriezekolk
Questions? Send me an email on
eelco [at-sign] ztpe [little dot] nl