BRK opcode without using SYNC

BigEd · Post by **BigEd** » Sat Dec 12, 2009 11:42 am

I thought others might be interested in Brad Taylor's doc about using external hardware to make use of the 1-byte operand of BRK.

He explains how a BRK can be spotted by external hardware even when SYNC is not available - as on the NES - by watching for three consecutive write accesses.

The same idea could be used to relocate or modify all the vectors at top of memory (or top of bank 0) - which allows some extra freedom in the memory map.

Gideon Zweijtzer has written a document "Safely freezing the C64 on an asynchronous event"
which explores other useful ways of reacting to the pattern of reads and writes. (I found it best to read the document from Google's cache.)

DaveK · Post by **DaveK** » Tue Jan 26, 2010 4:25 am

Completely untested as yet, but would something like this work:

Code: Select all

IRQBRKV:	cld							; A source of no end of problems
			phx
			tsx
			pha
			inx
			inx
			lda 0x0100,x				; Get stacked status register
			and #0x10					; B bit set?
			bne 1$						; If so, it's a BRK
			pla
			plx
			jmp [Z_IRQV]				; If not, jump to IRQ handler
1$:			jmp BRK_DISP

Code: Select all

BRK_DISP::	inx
			lda 0x0100,x				; Get stacked return address
			sec
			sbc #0x01					; Decrement it
			sta Z_BRKTMP+1				; And store in zero page
			inx
			lda 0x0100,x
			sbc #0x00
			sta Z_BRKTMP+2
			ldx #0x00
			lda [Z_BRKTMP+1]			; At that address is the second byte of BRK
			clc
			rol							; Shift left
			sta Z_BRKTMP+1				; Forms low byte of vector
			rol
			and #0x01					; This was bit 7 of the second byte of BRK
			ora #>BRK_VECTBL			; High byte of BRK vector table address
										; (must be aligned on 512-byte boundary)
			sta Z_BRKTMP+2				; High byte of vector
			lda #0x6C					; Opcode for indirect JMP
			sta Z_BRKTMP
			pla							; Restore registers to entry state
			plx
			jsr Z_BRKTMP				; Call addressed subroutine
			rti

kc5tja · Post by **kc5tja** » Tue Jan 26, 2010 6:57 am

This probably should go into its own thread. I don't see any relationship with detecting BRK at the hardware level versus how to discover BRK in software.

That being said, instead of:

Code: Select all

inx
inx
lda $0100,x

you could just write:

Code: Select all

lda $0102,x

Otherwise, your code looks reasonable on the surface.

As another simplification, I would not push and pull registers so much though. I'd keep the interrupted process context on the stack, like so:

Code: Select all

pha
phx
phy
tsx
lda $0104,x
and #$10
bne BRK_HANDLER
...
ply
plx
pla
rti

;
; ...
;

BRK_HANDLER:
...
ply
plx
pla
rti

This has some advantages -- for starters, you have the ability to tweak the interrupted code's registers if appropriate (for example, if you're using BRK to invoke OS calls, your OS routines can use the stacked registers to provide return values). Somewhat related, you can also update the calling program's PC to point back at the BRK instruction, with updated register values, to 'restart' an interrupted operation. This is called PC-LSRing (see http://www.falvotech.com/content/public ... pclsr.html )

DaveK · Post by **DaveK** » Tue Jan 26, 2010 7:54 am

Aha. I've just found a previous thread where I asked how to do something with BRK and you answered it with some code. How silly of me.

bogax · Post by **bogax** » Tue Jan 26, 2010 9:12 am

clc rol is asl

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Jan 26, 2010 11:48 pm

BigEd wrote:

I thought others might be interested in Brad Taylor's doc about using external hardware to make use of the 1-byte operand of BRK.

On the '816 in native mode, it's all academic, since BRK is separately vectored from IRQ. You would be able to get the BRK operand (the "signature" byte) with simple stack acrobatics and select a routine with a simple ASL A, TAX, JMP (TABLE,X) code sequence.

Note that both the 'C02 and '816 have the useful VPB (Vector Pull) output that tells the system when one of the hardware vectors is being accessed. That signal could be used to modify vectors on the fly. Also, the '816 has the COP instruction, which is a quasi-interrupt, in that it takes its own vector and has an operand like BRK. So one is not as limited with the '816 when it comes to creative uses of software interrupts. In effect, one could code an '816 equivalent to the x86 INT N instruction, e.g., INT $13.

kc5tja · Post by **kc5tja** » Tue Jan 26, 2010 11:54 pm

The only problem is performance. BRK vectoring is much slower than, say, declaring your OS entry point to be at $FFF0, at which sits a JMP (ROMTABLE,X) instruction.

On the other hand, BRK has the advantage of being completely operating mode independent (you have separate vectors for emulation vs. native modes). But, really, invoking $FFF0 for 6502 code and $FFF3 for 65816-native code doesn't seem like an overbearing requirement.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Jan 27, 2010 12:24 am

kc5tja wrote:

The only problem is performance. BRK vectoring is much slower than, say, declaring your OS entry point to be at $FFF0, at which sits a JMP (ROMTABLE,X) instruction.

On the other hand, BRK has the advantage of being completely operating mode independent (you have separate vectors for emulation vs. native modes). But, really, invoking $FFF0 for 6502 code and $FFF3 for 65816-native code doesn't seem like an overbearing requirement.

Of course, any sort of interrupt-based processing is going to exact a performance penalty, 8 clocks for native mode BRK vs. 6 for JSR (JSL on the '816 uses 8 cycles). Using BRK or COP as an OS call entry mechanism does automate some tasks for you, e.g., preserving the MPU state and, if coded into the preamble of the BRK handler, the registers. Something else BRK does for you as an OS entry device that JSR/JSL can't is create a call path about which an application need not know anything. All the application needs to know is which operand to use with the BRK instruction to select the desired OS function. Without the need to maintain a jump table for the benefit of applications, the OS can be more portable, I would think (as well as slightly smaller).

Regarding support of emulation mode, why bother if it's a new design? I can see reverting to emulation mode if the '816 is being used in a system that was originally 6502 powered so existing OS routines can be used. But for a new design, it wouldn't make any sense. That would be like getting a GPS receiver and then referring to a road map while on a trip.

kc5tja · Post by **kc5tja** » Wed Jan 27, 2010 12:39 am

BigDumbDinosaur wrote:

Without the need to maintain a jump table for the benefit of applications, the OS can be more portable, I would think (as well as slightly smaller).

I have some experience in this area, having written multitasking kernels for the x86 architecture. So far as I can tell, nothing is smaller than a flat jump table for dispatching purposes.

Obscuring the call path, again, isn't much of a deal breaker for me. Consider CP/M, which explicitly defined CALL 0 as a means of rebooting, and CALL 5 to enter BDOS. 20 years later, the L4 microkernel defines an explicit call path to invoke its functions too.

One thing having a centralized receiver for all OS calls permits, though, is common preprocessing for all OS invokations. For example, you can copy buffers between kernel and user space (or vice versa), log system calls for tracing purposes, etc.

Quote:

Regarding support of emulation mode, why bother if it's a new design?

The original poster wanted to explore intercepting BRK on a 65C02 system. By definition, that's legacy.

If he's intending on upgrading later on, a new entry point is warranted. It's not worth overloading a single entry point with mode detection logic.

Also, you wouldn't revert to emulation mode. The old entry point would include logic to go into, and return from, native-mode, not the other way around. That way, new apps get the full benefit, while old apps continue to run, albeit with a performance hit.

OwenS · Post by **OwenS** » Wed Jan 27, 2010 12:36 pm

L4/x86 uses the jump to isolate apps from the fact that on x86 you can enter the kernel via the INT, SYSENTER and SYSCALL instructions - with differing performance and support.

L4/AMD64 uses SYSCALL only and directly. Its as fast as SYSENTER and supported everywhere

(I would go into more depth but thats far too much to type on a touchscreen)

kc5tja · Post by **kc5tja** » Wed Jan 27, 2010 5:27 pm

When you have multiple virtual address spaces, using something like BRK or SYSCALL makes more sense because the hardware can trigger the MMU to change to a known-good state (in the 65816's case, you'd use VPB to accomplish this task).

But, again, in the case of a 6502 (or even 65816), having a fixed address to dispatch into the OS with is not a sin, and should be considered along with its other merits. Having to back up in the code to examine the BRK operand byte (or COP if you use it) takes a huge amount of time above and beyond the 6502's overhead for tearing out BRKs from genuine IRQs.

Never have I advocated NOT using BRK for this purpose. I'm just saying, the 6502/65816 CPUs don't make that choice compelling, like it is with, for example, the x86 architecture.

OwenS · Post by **OwenS** » Wed Jan 27, 2010 5:57 pm

If BRK is intended for entering the OS, encoding an operand byte into the opcode seems a bit pointless, and every other processor in existence seems to agree with the idea that it's better to pass the desired operation (or whatever) in a register (or occasionally on the stack)...

-----
OK, to cover the x86 weirdness from earlier, there are 3 methods commonly used for entering the kernel (I'm not going to cover call gates, because they're slow and only OS/2 uses them, task gates, because they're slow and nobody uses them, or intentionally invoking the illegal opcode handler, because that one's just silly)

Software interrupts via INT xx (or INT3, a special 1 byte instruction often used for software breakpoints). Supported on everything since the 8086/8088. Slow. Lots of vectors available.
SYSENTER. Intel's preferred method of entering the kernel. Significantly faster than software interrupts, but also requires somewhat more complex code. Works on all modern Intel processors in any mode, and AMD processors in 32-bit mode only (They designed the 64-bit extensions and deprecated SYSENTER; Intel ignored them)
SYSCALL. AMD's preferred method of entering the kernel. Slightly more barebones than Intel's; does little more than store the return address in a register and jump to the kernel (OK, it fiddles about with the CS segment register a bit, but x86 is messy like that...); even leaves the kernel stack-less. Supported on all modern AMD processors in all modes, and Intel processors in 64-bit mode.

So, to recap: Both manufacturers now support each other's extensions in awkward ways. And people say standardization is bad...

kc5tja · Post by **kc5tja** » Wed Jan 27, 2010 6:38 pm

OwenS wrote:

or intentionally invoking the illegal opcode handler, because that one's just silly

Actually, L4/x86 uses an illegal instruction handler to discover the kernel's information page. (Thankfully, everything else is done through the KIP).

I don't know if L4/amd64 uses this technique though.

Concerning the various methods of entering kernel-space, I already knew of all of them.

Back to the 65xx architecture, though, the operand byte of BRK and COP exist explicitly for "operation selection" purposes. That's how it's documented, and that's how the O.P. was intending on using it.

This is why I said, "I don't recommend that," in not so many words. I think it'd be much, much easier to load a table offset in the X register, and then put a jump table routine at a well-known address, and dispatch from there.

And, even simpler still, is to maintain a well-known jump table at fixed offsets relative to the top of ROM space. This has the advantage that you don't have to spend time pushing registers onto the stack just to compute the jump table offset. The Commodore KERNAL used this approach, as did GEOS for the Commodore platform.

Although, all things considered, the 6502's registers are so poor for use as general-purpose information-carrying tools that you might as well conduct your OS parameter passing using well-known zero-page locations anyway.

Code: Select all

os6502ep:
  ; 6502 and 65816 emulation-mode entry-point.
  ; CPU registers are treated as caches for ZP locations ONLY.
  ; Hence, all registers are used for our purposes, regardless of their
  ; previous contents.
  ;
  ; Assumes the following four bytes in zero-page:
  ; osCall: an 8-bit OS function ID
  ; indjmp: $4C (JMP opcode)
  ; indjmp+1/+2: address for the aforementioned JMP

  ldx osCall
  lda jmpTabLegacy,x
  sta indjmp+1
  lda jmpTabLegacy,x
  sta indjmp+2
  jmp indjmp

Thankfully, the 65816 has somewhat better characteristics so that you don't need these gyrations.

Code: Select all

os65816ep:
  jmp (jmpTabNative,x)
jmpTabNative:
  .word  charOut, charIn, errorOut
  ; ...etc...

fachat · Post by **fachat** » Wed Jan 27, 2010 6:39 pm

kc5tja wrote:

The only problem is performance. BRK vectoring is much slower than, say, declaring your OS entry point to be at $FFF0, at which sits a JMP (ROMTABLE,X) instruction.

On the other hand, BRK has the advantage of being completely operating mode independent (you have separate vectors for emulation vs. native modes). But, really, invoking $FFF0 for 6502 code and $FFF3 for 65816-native code doesn't seem like an overbearing requirement.

If you use a single entry point with using a register for the actual operation to call, you loose a precious register for parameter transfer. In my designs I use the jump table way where each operation has its own jump address (be it JMP absolute or JMP indirect).

In some systems I use AC, XR, YR and Carry for parameters - see "fwrite" in http://www.6502.org/users/andre/lib6502/lib6502.html for example. You could argue though to use different operations instead of the carry flag.

For my OS I have defined my relocatable file format, that allows late binding of operations to a program. I could, in the source define

Code: Select all

         JMP BASE_OS+15

(where 15 would be the operation's offset which I'd actually put into a constant too). "BASE_OS" would be bound to the program at load time.

This way I use very the same program on a system where the kernel is located at $F000 (my selfbuilt computer, CBM8x96) or even on a PET 3032 where the kernal sits at $7000.

André

fachat · Post by **fachat** » Wed Jan 27, 2010 6:40 pm

fachat wrote:

For my OS I have defined my relocatable file format, that allows late binding of operations to a program. I could, in the source define

The file format defines a CPU type parameter, so it would even be possible to bind different base jump table addresses for each CPU type.

André

BRK opcode without using SYNC

BRK opcode without using SYNC

Re: BRK opcode without using SYNC