2650B - The 2650 Restoration Project

Documentation from Signetics after 1978 mentions the 2650B variant of the 2650 microprocessor. The 2650B brings useful speed and usability improvements, at the expense of minor incompatibility in software and hardware. The 2650B seems to be extremely rare. I have not encountered any photos on internet anywhere, and I never encountered a hardware design specifically for the 2650B. Luckily, I received a 2650B from a fellow enthusiast for testing.

The 2650B brings four improvements to the 2650A:

Adds two instructions, to store (STPL) and load (LDPL) the lower Program Status Word to and from memory directly.
Replaces the two input pins ADREN and DBUSEN with one input pin BEN and one output pin CYLAST. The 2650B is therefore not pin-compatible with the 2650/2650A, although it is often trivial to modify boards to accommodate both variants.
Speeds up instructions that use register addressing; e.g. LODZ,r1 takes a single cycle instead of two.
Allows the two previously unused bits in the upper Program Status Word to be set and cleared as desired.

Two new instructions

With the 2650A it is possible to load the contents of the Program Status Word from memory, by first loading from memory into register zero and then using LPSU/LPSL to load from that register. These instructions are rare, and tend to occur only at the end of interrupt routines, when the registers and PSW are restored to their pre-interrupt state. However, this two-step action has a major drawback: afterwards the contents of register zero need to be restored, which changes the condition code bits that were just restored. This leads to complicated restore-from-interrupt routines.

The two new instructions make returning from interrupt handlers a lot easier.

New pin signals

The 2650B changes the meaning of two hardware pins on the processor.

2650A
pin 15 (input): ADREN = Address Bus Enable, active low.
pin 25 (input): DBUSEN = Data Bus Enable, active low. Used with ADREN to release the busses during DMA.

2650B
pin 15 (input): BEN = Bus Enable, active low. Combines ADREN and DBUSEN into a single signal. In addition, when active the control signals WRP, R/W, M/IO and and OPREQ are also placed in high impedance state.
pin 25 (output): CYLAST = Cycle Last. Indicates that the last cycle of the current instructions is being executed.

Most systems don’t use DMA and simply tie pins 15 and 25 to ground. These systems can be made 2650B-compatible by placing a 10 or 20 kΩ resistor between pin 25 and ground. Thus, when using a 2650A ADREN is still pulled low, while a 2650B sees a reasonable load on its output pin CYLAST.

One as yet unanswered question is how the CYLAST signal functions precisely. It supposedly marks the last cycle of instructions, which are 1 to 6 cycles in duration. According to the data sheets, the CYLAST signal is raised at most 450 ns after the start of the cycle, and lowered after a similar delay after the start of the next cycle.

If this does not strike you as odd, then know that the instruction fetch from memory happens during the second and third clock pulse of the first cycle:

In other words, for single-cycle instructions the CYLAST signal is raised before the instruction has been fetched from memory, and before its length could be determined. Clearly the 2650 is not clairvoyant. It cannot predict the future.

After some tinkering with an oscilloscope, visualizing CYLAST against OPREQ, R/W, M/IO and the clock, I came to the conclusion that CYLAST actually marks the start of an instruction.

CYLAST is actually a FETCH signal

CYLAST is raised shortly after the start of the first cycle of an instruction, when a new instruction is fetched from memory. It is dropped at the start of the next cycle, and re-raised immediately if the instruction took a single cycle. When executing single cycle instructions in succession, CYLAST remains high, with very short pulses to low at the start of each instruction.

This is actually not too surprising when you know the history of the slave card in the Signetics TWIN. Details can be read on the page about debugging, but the short of it is that TWIN uses a special, Signetics-only version of the 2650A in which the FETCH signal is multiplexed on the RUN/WAIT signal. The 2650B simply exposes this signal on a dedicated pin.

As an example of the timing signals, have a look at the read-from-keyboard loop from the Central Data system. The keyboard is connected to port D, with the high bit strobing from high to low to indicate a key press. The read-loop is therefore:

* Read from keyboard
KBIN	redd,r3			; 1 byte   2 cycles
	tmi,r3	h'80'		; 2 bytes  3 cycles
	bctr,eq	KBIN		; 2 bytes  3 cycles
* r3 contains the key pressed

On my old 2-channel oscilloscope the clock, CYLAST, OPREQ and M/IO signals show as follows. (Picture composed of three old-fashioned screen shots. Apologies for poor quality of these photos).

Red lines mark the start of the first cycle of an instruction. Cyan lines mark cycles within an instruction. CYLAST is raised shortly after the start of a cycle, and lowered shortly after the start of the next. OPREQ is raised, if necessary, during the second and third clock period of the cycle. (There are three clock periods in a processor cycle). The M/IO signal is lowered when reading from port D.

The REDD instruction takes 2 cycles and 2 OPREQ signals: to fetch the opcode and to read from port D (with M/IO low).
The TMI instruction takes 3 cycles: to fetch the opcode, to fetch the operand, and to compute the result (which does not require memory or I/O access and therefore no OPREQ).
The BCTR instruction takes 3 cycles: to fetch the opcode, to fetch the branch offset, and to compute whether or not to take the branch. The latter does not require memory or I/O access, and causes no OPREQ signal.

Register addressing in single cycle

Register addressing happens when an instruction operates on R0 and one of the other registers R1, R2, R3. There are eight such instructions: LODZ, EORZ, ANDZ, IORZ, ADDZ, SUBZ, STRZ, COMZ. These instructions use a single byte. On the 2650A they take 2 cycles, but on the 2650B they are executed in just a single cycle.

Two of these instructions have a special meaning. ANDZ,R0 does not exist, and is used for HALT instead; STRZ,R0 does not exist either, and is used for NOP. Will these two instructions receive a speed-up as well? For HALT is is difficult to test, and perhaps not very interesting. But for NOP the question is real, because NOP is often used as padding in delay loops. If NOP were to execute in a single cycle rather than two, the timing of delay routines will be off.

To test, I filled the entire RAM in page 1 (address 2000 to 3fff) with NOP opcodes. Jumping to address 2000 will then continuously execute NOPs, as code can only change pages using an explicit branch instruction. Using an oscilloscope to trace the OPREQ and CYLAST signals reveals that also NOP takes a single cycle on the 2650B. (But note that NOP is still not identical to STRZ,R0, because NOP does not change the condition code bits, whereas STRZ does.)

It is not easy to determine the speed-up from this. To get an idea I did a code analysis of SDOS 4.2. SDOS is a good example, because it is large and complex and therefore uses the full power of the instruction set. The code analysis is a simple static analysis, counting how many times each instruction appears in the code listing, not counting how many times each instruction is executed during operation. This analysis shows that only 8% of instructions use register addressing; 41% use absolute addressing, 23% use immediate addressing, 19% use relative addressing. The remaining 9% are instructions such as RETC that have no addressing mode. With only 8% of the code receiving a speed-up the benefits appear to be marginal.

User flags in the PSU

Bits 10 and 08 in the upper Program Status Word have no assigned meaning. With the 2650A these bits always read as zeroes. The 2650B makes these bits user-settable, creating PSU-bits User Flag 1 (UF1) and User Flag 2 (UF2).

Bits UF1 and UF2 can be set, cleared and tested using the normal PPSU, CPSU and TPSU instructions. For example, the following routine counts the length of a string in a buffer, and reports whether any of the characters are non-digits.

* New PSU flags for 2650B
UF1	EQU	h'10'
UF2	EQU	h'08'

* String starts at this address
String	EQU	h'1234'

* Routine to determine the length of the string,
* and report on non-digits
	cpsu	UF1		; lower the flag
	lodi,r1	-1		; index into the string
Loop	loda,r0	String,r1,+	; get next character
	comi,r0	h'0d'
	bctr,eq	Done		; jmp if end-of-line
	comi,r0	a'0'
	bctr,lt	NonDig		; jmp if non digit
	comi,r0	a'9'
	bcfr,gt	Loop		; repeat loop if digit
NonDig	ppsu	UF1		; raise non-digit flag
	bctr,un	Loop		; repeat loop

* r0 = length of the string
* UF1 = non-digits were found
Done	tpsu	UF1		; was flag raised?
	bcta,eq	Error		; resolve the error

Although the PSW-instructions are relatively slow (they take 3 cycles), the readability of the code makes these two bits a welcome addition. On the 2650A you would use another register as the flag (status zero or non-zero); and registers are scarce.

Conclusion

If the 2650-family had become more popular, the 2650B might have replaced the 2650A as the variant of choice. The 2650B does not overhaul the architecture of the 2650A, but it does add some improvements that make designing hardware and software with the 2650 noticeably easier. Unfortunately the 2650B never rose to prominence, forcing hardware and software engineers to design to the 2650A instead. The real-life impact of the 2650B was therefore even smaller than of the 2650 itself.

Given the choice between a 2650A-1 –which are not too difficult to find– clocked at 2MHz or a 2650B clocked at 1.25MHz, I would still opt for the 2650A-variant. A 2650B-1 however would be an easy choice…