Thanks all. Looks like I'm headed in the right direction. Still, it was nice to figure out how the COP instruction worked.
BigDumbDinosaur wrote:
It appears quite a bit of the direct page usage is for storing odds and ends. At the expense of a slight reduction in performance, you might be able to relocate many of those variables to absolute storage and thus avoid having to map a special direct page just for doing floating point work.
Good idea, but care is needed. I tried this without success when I found out my own direct page wasn't big enough. I tried to move some of the temporary float registers (which are 20 bytes each) off the direct page but found that they're assumed to be contiguous by some portions of the code. To keep the port simple, I just used a separate direct page. The comments do mention that the data bank register needs set to Bank 0, implying that absolute addressing is being used in some cases, even though they're on the direct page.
And a funny aside from a 65816 newbie: I puzzled for a while when the linker failed with a range error when I tried to combine the floating-point direct page with mine. Since the direct page is movable, I hadn't really considered that it was still actually just a single page in length. Duh! With a single byte operand, what else would it be?
There might be other optimizations possible as well. The package is written using 8-bit registers, frequently switching to use a 16-bit accumulator (and index registers in some cases) and then back again. It seems some efficiency might be gained starting with 16-bit registers and switching to 8-bit registers when needed. I'd probably just to start from scratch though and streamline the whole package as 128-bit precision is way more than I really need anyway. Still porting is easier than writing from scratch, so I'll stick with this for a while. Thanks granati and 6502.org!
Edit (7/21/2022):
BigDumbDinosaur wrote:
The above uses the trick of pointing direct page to the stack. It does work pretty well, but I just knew
there had to be a faster method. Deep pondering led to the following code ...
I suppose you're referring to the overall COP service routine rather than just obtaining the signature byte alone, becuase for that it seems the former is faster than the latter. Extracting the essence of this for each and leaving the signature byte in the accumulator we have for the former:
Code:
; get COP signature byte
tsc ; 2 cycles 1 byte(s)
tcd ; 2 1
dec ADDR ; 8 2
lda [ADDR] ; 8 2
and #$ff ; 3 3
inc ADDR ; 8 2
; 31 11
versus for the latter:
Code:
; get COP signature byte
tsc ; 2 1
tcd ; 2 1
ldy ADDR ; 5 2
dey ; 2 1
sep #20 ; 3 2
lda BB ; 4 2
pha ; 3 1
plb ; 4 1
rep #20 ; 3 2
lda 0,y ; 4 2
and #$ff ; 3 3
; 35 18
And finally from mine above:
Code:
; get COP signature byte
lda ADDR,s ; 5 2
dec ; 2 1
sta F2 ; 4 2
lda BB,s ; 5 2
sta F2+2 ; 4 2
lda [F2] ; 7 2
and #$ff ; 3 3
; 30 14
which is just a bit faster than the first, but longer. Now if we just had INC and DEC for the Stack Relative address mode (
I've often wished for this):
Code:
; get COP signature byte
dec ADDR,s ; 8 2
lda [ADDR] ; 8 2
and #$ff ; 3 3
inc ADDR,s ; 8 2
; 27 9
which is basically the first method without the switch of the direct page. Unfortunately, along those lines we have to settle for:
Code:
; get COP signature byte
lda ADDR,s ; 5 2
dec ; 2 1
sta ADDR,s ; 5 2
lda [ADDR] ; 8 2
and #$ff ; 3 3
tax ; 2 1
lda ADDR,s ; 5 2
inc ; 2 1
sta ADDR,s ; 5 2
txa ; 2 1
; 39 17
which is worst of all (though you could save the TAX/TXA if you could wait until post to INC the return address to its original value). Still, it's not the best.