A pattern is a reusable solution to a commonly occurring problem. The following patterns are useful to save memory and execution cycles. Note that many patterns come at the expense of some code-readability.
- Register addressing with R0
- Setting the cc status bits
- Reuse an absolute addresses
- Indirection for repeated branches
- Increment or decrement an 8-bit value
- Increment a 16-bit memory value
- Decrement a 16-bit memory value
- Tail and fall-through subroutine calls
- Use indirection to access data in other pages
- Returning from an interrupt
Where patterns are ‘good’ useful solutions, anti-patterns are ‘bad’ or inefficient solutions that should not be used. Some anti-patterns:
Register addressing with R0
These instructions are 1 byte and 2 cycles (1 cycle on the 2650B) and make register zero operate on itself. They have interesting side-effects.
Opcode | Instruction | Effects |
---|---|---|
00 | lodz,r0 | See separate section below. |
20 | eorz,r0 | Clears r0, and clears both condition code bits in PSL. Saves one byte compared to lodi,r0 0 . |
40 | andz,r0 | Do not use. Opcode is used for the HALT instruction. |
60 | iorz,r0 | Sets condition code bits to the value of r0, without changing r0. |
80 | addz,r0 | Doubles r0 (with carry, if set). Similar to rrl,r0 . |
a0 | subz,r0 | Same as eorz,r0 . |
c0 | strz,r0 | Do not use. Opcode is used for the NOP instruction. Note that this instruction uses 2 cycles even on the 2650B. |
e0 | comz,r0 | Clears both condition code bits, without changing R0. |
Setting the cc status bits
The base method for setting the two status bits to a known value is by using CPSL and PPSL. This is inefficient.
* Inefficient
SetEq cpsl CC1+CC0 ; 2 bytes 3 cycles
SetGt cpsl CC1
ppsl CC0 ; 4 bytes 6 cycles
SetLt ppsl CC1
cpsl CC0 ; 4 bytes 6 cycles
Note that it is entirely legal to set both CC1 and CC0 to 1, although the 2650 will never do so itself.
A better way to clear the two cc status bits is to use register addressing on register 0.
SetEq comz,r0 ; 1 bytes 2 cycles
SetGt comz,r0
ppsl CC0 ; 3 bytes 5 cycles
SetLt comz,r0
ppsl CC1 ; 3 bytes 5 cycles
If the contents of some register rx
do not have to be retained, you can us this:
SetGt lodi,rx 1 ; 2 bytes 2 cycles
SetLt lodi,rx -1 ; 2 bytes 2 cycles
To set the status bits to the current value of r1
.. r3
:
comi,rx 0 ; 2 bytes 2 cycles
When COM is 0 (Logical compare) the status code bits will be set to Zero or Positive; Negative can only happen when COM is 1 (Arithmetic compare).
Reuse an absolute addresses
Use a relative indirect instruction to reuse a nearby absolute address. Using relative addressing saves one execution cycle, but the indirection costs an extra two cycles. In effect, you spend one extra execution cycle to save one byte of memory.
code loda,r0 addr
:
strr,r0 *code+1 ; 2 bytes 5 cycles
instead of:
code loda,r0 addr
:
stra,r0 addr ; 3 bytes 4 cycles
Indirection for repeated branches
If execution speed is not a concern, then you can save memory when code makes repeated branches to the same address.
Err ACON Error
: some code
bcfr,eq *Err
: some code
bcfr,gt *Err
: some code
bcfr,lt *Err
: some code
bctr,un *Err
The relative branches use two bytes instead of three for an absolute branch (but require an extra cycle). If you have more than two branches to the same address you start saving memory.
This pattern can be combined with Reuse An Absolute Address, to save an extra byte.
: some code
Err EQU $+1
bcfa,eq Error
: some code
bcfr,gt *Err
: some code
bcfr,lt *Err
: some code
bctr,un *Err
Increment or decrement an 8-bit value
The base method to increment or decrement a register:
addi,r1 1 ; 2 bytes 2 cycles
subi,r2 1 ; 2 bytes 2 cycles
This is subject to the state of the WC bit. The CC bits are set to the new register value. In TWIN TOS the following pattern is used.
birr,r1 $+2 ; 2 bytes 3 cycles
bdrr,r2 $+2 ; 2 bytes 3 cycles
This takes an extra cycle, but may be useful when the state of the CC bits needs to be preserved, or when the state of WC cannot be assumed. Otherwise a direct addi
or subi
is faster and more readable.
The base method to increment a value in memory:
loda,r0 addr ; 3 bytes 4 cycles
addi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr ; 3 bytes 4 cycles
This only works when WC is set to 0, which is normally the case, or if CAR is known to be 0. Otherwise the counter may inadvertently be increased by two.
The following pattern is used in VHS Dos. Save one byte of memory without a compute penalty and is independent of the WC bit.
eorz,r0 ; 1 byte 2 cycles
adda,r0 addr-1,r0+ ; 3 bytes 4 cycles
stra,r0 addr ; 3 bytes 4 cycles
Register r0
is first incremented, and becomes 1. It is then added (as the index register) to addr-1
yielding addr
. The contents at that address are added to r0
(which contains 1), to increment the value.
Especially useful if you need the zero in r0
anyway to initialise other registers or memory bytes.
When decrementing, the following pattern is inspired by the “VHS DOS-pattern”.
eorz,r0
adda,r0 addr-h'ff',r0-
stra,r0 addr
Register r0
is first decremented, and becomes -1 / h’ff’. It is then added (as the index register) to addr-h'ff'
yielding addr
. The contents at that address are added to r0
(which contains -1).
Increment a 16-bit memory value
This base method is quite inefficient with 20 bytes and 26 cycles.
loda,r0 addr+1 ; 3 bytes 4 cycles
addi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr+1 ; 3 bytes 4 cycles
ppsl WC ; 2 bytes 3 cycles
loda,r0 addr ; 3 bytes 4 cycles
addi,r0 0 ; 2 bytes 2 cycles
stra,r0 addr ; 3 bytes 4 cycles
cpsl WC ; 2 bytes 3 cycles
It can be done without using the program status word in 18 bytes and 13 cycles most of the time, 23 cycles worst case. This only works when WC is set to 0.
loda,r0 addr+1 ; 3 bytes 4 cycles
addi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr+1 ; 3 bytes 4 cycles
brnr,r0 skip ; 2 bytes 3 cycles
loda,r0 addr ; 3 bytes 4 cycles
addi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr ; 3 bytes 4 cycles
skip :
This can be combined with the pattern for incrementing an 8-bit value, reducing it to 16 bytes and 13 cycles (23 cycles worst case).
eorz,r0 ; 1 byte 2 cycles
adda,r0 addr,r0+ ; 3 bytes 4 cycles
stra,r0 addr+1 ; 3 bytes 4 cycles
brnr,r0 skip ; 2 bytes 3 cycles
eorz,r0 ; 1 byte 2 cycles
adda,r0 addr-1,r0+ ; 3 bytes 4 cycles
stra,r0 addr ; 3 bytes 4 cycles
skip :
If you can use two registers, this neat variant is 16 bytes and 11 cycles (22 cycles worst case). It is also independent of the state of the WC bit.
loda,r1 addr+1 ; 3 bytes 4 cycles
birr,r1 skip2 ; 2 bytes 3 cycles
loda,r0 addr ; 3 bytes 4 cycles
birr,r0 skip1 ; 2 bytes 3 cycles
skip1 stra,r0 addr ; 3 bytes 4 cycles
skip2 stra,r1 addr+1 ; 3 bytes 4 cycles
If no registers are available, avoid the bank-1-antipattern: switching to bank 1 and back again raises the cost to 20 bytes and 28 cycles worst case.
Decrement a 16-bit memory value
This base method uses 20 bytes and 26 cycles.
loda,r0 addr+1 ; 3 bytes 4 cycles
subi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr+1 ; 3 bytes 4 cycles
ppsl WC ; 2 bytes 3 cycles
loda,r0 addr ; 3 bytes 4 cycles
subi,r0 0 ; 2 bytes 2 cycles
stra,r0 addr ; 3 bytes 4 cycles
cpsl WC ; 2 bytes 3 cycles
A simple alternative without the With Carry bit uses the same memory but reduces the typical case to 15 cycles (25 cycles worst case):
loda,r0 addr+1 ; 3 bytes 4 cycles
subi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr+1 ; 3 bytes 4 cycles
comi,r0 h'ff' ; 2 bytes 2 cycles
bcfr,eq skip ; 2 bytes 3 cycles
loda,r0 addr ; 3 bytes 4 cycles
subi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr ; 3 bytes 4 cycles
skip :
The Overflow bit can be used to detect the wrap-around. Memory-usage is still the same with 20 bytes, and the typical case takes 16 cycles (26 cycles worst case):
loda,r0 addr+1 ; 3 bytes 4 cycles
subi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr+1 ; 3 bytes 4 cycles
tpsl OVF ; 2 bytes 3 cycles
bcfr,eq skip ; 2 bytes 3 cycles
loda,r0 addr ; 3 bytes 4 cycles
subi,r0 1 ; 2 bytes 2 cycles
stra,r0 addr ; 3 bytes 4 cycles
skip :
Chain and fall-through subroutine calls
Chain subroutine calls at the end of a subroutine. Instead of:
: part of first sub
bsta,un SUB ; 3 bytes 3 cycles
retc,un ; 1 byte 3 cycles
use:
: part of first sub
bcta,un SUB ** chain ; 3 bytes 3 cycles
To save one byte and three cycles. It is good practice to mark this branch with “** chain” as a comment, to warn about the hidden return instruction.
When the subroutine immediate follows the code, you can omit the branch. It is good practice to mark this with “** fall-through” as a comment, to warn about the implicit branch instruction.
: part of first sub
** fall-through
SUB :start of new sub
Use indirection to access data in other pages
The 2650 divides its 32K address space into four pages of 8K each. While absolute branch instructions can jump to any address in the 32K address space, absolute data instructions such as LODA and ADDA are restricted to their page.
In order to load or store into an address location in another page, use indirection.
ORG H'2000' Code lives in page 1
Other ACON H'6100' vector into page 3
loda,r0 *Other Fetch byte from another page
:
stra,r0 *Other Store data to another page
Sometimes you need to access a 16-bit value in another page. For example, your code wants to access the location of the cursor. The Central Data computer stores the cursor position at H’17FE’ (high byte) and H’17FF’ (low byte). Set this by combining indexing and indirection. Remember: indexing is applied to the result of indirection; adding the index is the last step in determining the effective address.
ORG H'2000' Code lives in page 1
Curs ACON H'17FE' vector to 16-bit value in page 0
* Set the cursor at location H'1234'
lodi,r0 H'12'
stra,r0 *Curs Set the high byte of the cursor
lodi,r0 H'34'
lodi,r1 1
stra,r0 *Curs,r1 Set the low byte of the cursor
Alternatively, you can define two vectors. This saves two cycles, without an increase in memory.
ORG H'2000' Code lives in page 1
Curs ACON H'17FE' vector into 16-bit value in page 0
CursLo ACON H'17FF' vector into low byte
* Set the cursor at location H'1234'
lodi,r0 H'12'
stra,r0 *Curs Set the high byte of the cursor
lodi,r0 H'34'
stra,r0 *CursLo Set the low byte of the cursor
Note that asm2650 will generate an error when an absolute data instruction accesses data in another page. Most other assemblers will silently generate incorrect code
Returning from an interrupt
Interrupts can occur anytime between two instructions. In order not to upset the running program the interrupt handler must ensure that the processor state is not changed in any unexpected way. This applies especially to the Lower Program Status Word, as these bits are set and cleared according to the instructions in the interrupt handler. It is therefore essential for the interrupt handler to save the PSL, and restore it at the end of the interrupt handler. The instructions spsl
and lpsl
can be used for this. Since these modify register zero, R0 has to be saved and restored as well.
* Incorrect interrupt handler
Handler stra,r0 SavR0
spsl Stores PSL into R0
stra,r0 SavPSL
*
* handler code here, restore any changes to other registers
*
loda,r0 SavPSL
lpsl Restores PSL
loda,r0 SavR0 Changes condition code!
rete,un
SavPSL RES 1
SavR0 RES 1
The problem here is that the last loda
instruction changes the PSL again. To work around this issue, use the following pattern.
* Correct interrupt handler, 2650 and 2650A
Handler stra,r0 SavR0
spsl Stores PSL into R0
stra,r0 SavPSL
*
* handler code here, restore any changes to other registers
*
loda,r0 SavPSL
lpsl Restores PSL
bctr,gt RetGT
bctr,lt RetLT
*
RetZ loda,r0 SavR0
comz,r0 CC = EQ
rete,un
*
RetGT loda,r0 SavR0
comz,r0
ppsl CC0 CC = GT
rete,un
*
RetLT loda,r0 SavR0
comz,r0
ppsl CC1 CC = LT
rete,un
SavPSL RES 1
SavR0 RES 1
The above works when running from ROM (e.g. from a game cartridge). When running from RAM it is possible to modify the code at runtime:
* Correct interrupt handler for RAM, 2650 and 2650A
Handler stra,r0 SavR0+1
spsl Stores PSL into R0
stra,r0 SavPSL+1
*
* handler code here, restore any changes to other registers
*
SavR0 lodi,r0 00 Will be overwritten with actual value
cpsl h'ff'
SavPSL ppsl 00 Will be overwritten with actual value
rete,un
Now compare this to how this is done using the 2650B microprocessor, using the two instructions that were added in this variant.
* Correct interrupt handler, 2650B only
Handler stpl SavPSL
*
* handler code here, restore any changes to registers
*
ldpl SavPSL
rete,un
SavPSL RES 1
Don’t use instruction lodz,r0
There seems to be conflicting information on whether the lodz,r0
instruction is legal or not. The hardware manuals on the 2650 microprocessor by Signetics and Philips (even later ones describing the 2650B) are very clear:
When the specified register, r, equals 0, the operation code is changed to 6016 by the assembler. The instruction, 00000000, yields indeterminate results.
Signetics 2650 Microprocessor manual.
Opcode 60 stands for iorz,r0
. Arguably both instructions should yield the same result: the contents of r0
are unchanged but the Condition Code bits in the Program Status Word are set to either 00 (zero), 01 (positive) or 10 (negative) according to the value in r0
. However, lodz,r0
does not reliably work (“yields indeterminate results”), and iorz,r0
should be used instead.
Problem solved? Well, several programs — including a lot of software written by Central Data — make use of lodz,r0
. One conclusion is that apparently they used a different assembler than Signetics, but more importantly: the lodz,r0
instruction appears to work fine in practice. Furthermore, the official manual to the Instructor 50 mentions this:
When the specified register, r, equals 0, the operation code is changed to 6016 (IORZ) by the assembler. However, the processor will execute the instruction 0016 correctly.
Introduction to the Instructor 50 Desktop Computer.
To avoid issues (and discussions) it is best to avoid lodz,r0
and use iorz,r0
instead. The asm2650 assembler issues an warning for it, but does not silently change it to iorz,r0
.
Don’t use register bank 1
Normally bank 0 is selected, and instructions operate on R0..R3. A switch to bank 1 is done only at the very beginning of certain subroutines, to avoid modifying R1..R3 in bank 0. With only three registers holding data (R0 is used as a general accumulator), it becomes necessary to store data in memory. One might think that it is a waste not to use 3 scarce registers in bank 1 during normal operations, but bank switching is expensive and often not worth the effort.
For example, scratch space can be used to control a loop like this (14 bytes, 18 cycles):
TEMP RES 1 ; 1 byte
eorz,r0 ; 1 byte 2 cycles
strr,r0 RES ; 2 bytes 3 cycles
Loop :
:
lodr,r0 RES ; 2 bytes 3 cycles
addi,r0 1 ; 2 bytes 2 cycles
strr,r0 RES ; 2 bytes 3 cycles
comi,r0 MAX ; 2 bytes 2 cycles
bcfr,eq Loop ; 2 bytes 3 cycles
This is more efficient than using a register in bank 1. The following anti-pattern uses 16 bytes, 23 and cycles:
eorz,r0 ; 1 byte 2 cycles
ppsl RS ; 2 bytes 3 cycles
strz,r4 ; 1 byte 2 cycles
cpsl RS ; 2 bytes 3 cycles
Loop :
:
ppsl RS ; 2 bytes 3 cycles
addi,r4 1 ; 2 bytes 2 cycles
comi,r4 MAX ; 2 bytes 2 cycles
cpsl RS ; 2 bytes 3 cycles
bcfr,eq Loop ; 2 bytes 3 cycles
Perhaps this anti-pattern is a specific example of the more generic anti-pattern of using the xPSL / xPSU instructions. Working with the Program Status Word is expensive.