How does one handle COP as a co-processor decoded instruction?
Page one of this thread touches on some ideas, but I have no direct knowledge of anyone actually implementing these ideas in real life.
Naturally, you need some hardware goodies to get the '816 and the co-processor to play nicely with each other, and some software goodies to tell them how to play, but that's about as much as I can offer at this stage of the game.
Understood. I jumped into this thread because this is the next step in a project I’m working on, connecting a 65c816 to an FPGA. The connection is made. Now I want to connect various IPs as “coprocessors.”
If it gets there from the hardware reset line, the SEI won't be necessary as it's an automatic part of the reset sequence; however, if you jump there from software for any reason, you'll want to ignore interrupts until things are set up properly again. I've never done it that way myself.
Correct. It's a hedge against something jumping to the reset routine while IRQs are active. Otherwise, SEI would be redundant.
If it gets there from the hardware reset line, the SEI won't be necessary as it's an automatic part of the reset sequence; however, if you jump there from software for any reason, you'll want to ignore interrupts until things are set up properly again. I've never done it that way myself.
Correct. It's a hedge against something jumping to the reset routine while IRQs are active. Otherwise, SEI would be redundant.
...
LDA STKPC,S ;save pc location
DEC a ;less one
STA BITLOC ;save signature byte location
LDA BITLOC ;load sig byte
STA CPRO ;store sig byte
...
I wrapped it in my general 'c816 ISR parameters.
I think the LDA BITLOC needs to be LDA (BITLOC). You're actually loading a word instead of a byte, but that should be harmless. You also seem to be assuming that your data bank coincides with the COP-caller's program bank, and that may also be harmless. That stack offset of $0B may need to be carefully investigated, but I have no evidence at hand regarding its correctness or lack thereof. Those might not be the only issues, but the remainder is beyond the scope of my experience.
BDD wrote a nice article here. If you haven't seen it yet, you should give it a glance.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
...
LDA STKPC,S ;save pc location
DEC a ;less one
STA BITLOC ;save signature byte location
LDA BITLOC ;load sig byte
STA CPRO ;store sig byte
...
I wrapped it in my general 'c816 ISR parameters.
I think the LDA BITLOC needs to be LDA (BITLOC). You're actually loading a word instead of a byte, but that should be harmless. You also seem to be assuming that your data bank coincides with the COP-caller's program bank, and that may also be harmless. That stack offset of $0B may need to be carefully investigated, but I have no evidence at hand regarding its correctness or lack thereof. Those might not be the only issues, but the remainder is beyond the scope of my experience.
BDD wrote a nice article here. If you haven't seen it yet, you should give it a glance.
Ugh, you're right. I want to get the value stored at the address location STORED explicitly at BITLOC. So confusing!
CO_PRO SEI
phb ;save DB
phd ;save DP
rep #%00110000 ;select 16 bit registers
pha ;save .C
phx ;save .X
phy ;save .Y
LDA STKPC,S ;save pc location
DEC a ;less one
STA BITLOC ;save signature byte location
LDA BITLOC ;load sig byte
STA CPRO ;store sig byte
rep #%00110000 ;16 bit registers
ply ;restore .Y
plx ;restore .X
pla ;restore .C
pld ;restore DP
plb ;restore DB
CLI
rti ;resume foreground task
I wrapped it in my general 'c816 ISR parameters.
As noted by Mike, LDA BITLOC needs to be LDA (BITLOC). Also, as he noted, you are assuming DB = PB, which may not be the case. Prior to fetching the signature you need to load DB with the stack copy of PB to guarantee you are fetching the signature from the bank in which the interrupted program resides.
The SEI at the beginning of your ISR and the CLI at the end are unnecessary. All interrupts, be they hard or soft, automatically set the I bit in SR (status register) after SR has been pushed. When RTI is executed at the end of the ISR SR will be restored as it was prior to the interrupt, which means the state of the I bit will be likewise restored—IRQs will be re-enabled if they were enabled at the time of the interrupt. The MPU does this for you at no extra charge. Of course, you can't afford to forget that if you are using COP as an operating system API calling mechanism your system might deadlock if IRQs remain disabled while processing an API.
It is useful in any function that involves stack accesses to define local symbols for stack offsets. For example, in my POC firmware, I have the following preamble for the COP handler:
Constants such as S_MPUDPX and S_WORD are externally defined, with the former defining the size in bytes of DP (direct page register) and the latter defining the size of a word (symbols that start with S_ are "size-of" constants). I, as a rule, never hard-code such constants as instruction operands and always define them in an INCLUDE file to assure all programs that refer to such constants are singing from the same hymnal, so to speak. With the above arrangement, I can, for example, pick up what was in PB at the time of the interrupt with LDA COP_PBX,S.
Getting back to the program, so to speak, there is a method of fetching the signature that uses less code and no direct page space. It takes advantage of the fact that if a base address is $0000 it is possible to load from anywhere in the 64KB address space of any bank with a 16-bit register offset. The following procedure starts after the above preamble:
; continuation of COP ISR after jumping thru IVCOP...
;
icopa cli ;resume IRQ processing
rep #%00110000 ;16-bit everything
lda cop_pcx,S ;get RTI address
dec A ;point at signature
tax ;use as index
sep #%00100000 ;8-bit accumulator
lda cop_pbx,S ;calling bank
pha ;set as a...
plb ;temporary data bank
lda \2$00,X ;fetch signature...
;
; ———————————————————————————————————————————————————————
; The \2 prefix forces absolute addressing. Absent that,
; the fetch would be from bank $00, not from the bank in
; DB.
; ———————————————————————————————————————————————————————
...program continues...
Something any ISR needs to consider is the state of DP. You cannot assume DP is pointing to your operating system kernel's direct page, since the foreground may well have pointed DP somewhere else. Ergo you might have to add some code to set DP as required. Otherwise, you may end up wearing out the reset button.
Similarly, you may need to point DB to where the kernel's data tables and other structures, e.g., buffers and queues, are located. Most often they would be in the same bank as the kernel, which means a simple PHK - PLB sequence is all that is needed. Otherwise, you will have to load the correct value for DB into a 8-bit register, push it and then execute PLB.
Finally, it's useful to define a common interrupt return that all ISRs can use. Here's what I have set up:
CO_PRO SEI
phb ;save DB
phd ;save DP
rep #%00110000 ;select 16 bit registers
pha ;save .C
phx ;save .X
phy ;save .Y
LDA STKPC,S ;save pc location
DEC a ;less one
STA BITLOC ;save signature byte location
LDA BITLOC ;load sig byte
STA CPRO ;store sig byte
rep #%00110000 ;16 bit registers
ply ;restore .Y
plx ;restore .X
pla ;restore .C
pld ;restore DP
plb ;restore DB
CLI
rti ;resume foreground task
I wrapped it in my general 'c816 ISR parameters.
As noted by Mike, LDA BITLOC needs to be LDA (BITLOC). Also, as he noted, you are assuming DB = PB, which may not be the case. Prior to fetching the signature you need to load DB with the stack copy of PB to guarantee you are fetching the signature from the bank in which the interrupted program resides.
The SEI at the beginning of your ISR and the CLI at the end are unnecessary. All interrupts, be they hard or soft, automatically set the I bit in SR (status register) after SR has been pushed. When RTI is executed at the end of the ISR SR will be restored as it was prior to the interrupt, which means the state of the I bit will be likewise restored—IRQs will be re-enabled if they were enabled at the time of the interrupt. The MPU does this for you at no extra charge. Of course, you can't afford to forget that if you are using COP as an operating system API calling mechanism your system might deadlock if IRQs remain disabled while processing an API.
It is useful in any function that involves stack accesses to define local symbols for stack offsets. For example, in my POC firmware, I have the following preamble for the COP handler:
Constants such as S_MPUDPX and S_WORD are externally defined, with the former defining the size in bytes of DP (direct page register) and the latter defining the size of a word (symbols that start with S_ are "size-of" constants). I, as a rule, never hard-code such constants as instruction operands and always define them in an INCLUDE file to assure all programs that refer to such constants are singing from the same hymnal, so to speak. With the above arrangement, I can, for example, pick up what was in PB at the time of the interrupt with LDA COP_PBX,S.
Getting back to the program, so to speak, there is a method of fetching the signature that uses less code and no direct page space. It takes advantage of the fact that if a base address is $0000 it is possible to load from anywhere in the 64KB address space of any bank with a 16-bit register offset. The following procedure starts after the above preamble:
; continuation of COP ISR after jumping thru IVCOP...
;
icopa cli ;resume IRQ processing
rep #%00110000 ;16-bit everything
lda cop_pcx,S ;get RTI address
dec A ;point at signature
tax ;use as index
sep #%00100000 ;8-bit accumulator
lda cop_pbx,S ;calling bank
pha ;set as a...
plb ;temporary data bank
lda \2$00,X ;fetch signature...
;
; ———————————————————————————————————————————————————————
; The \2 prefix forces absolute addressing. Absent that,
; the fetch would be from bank $00, not from the bank in
; DB.
; ———————————————————————————————————————————————————————
...program continues...
Something any ISR needs to consider is the state of DP. You cannot assume DP is pointing to your operating system kernel's direct page, since the foreground may well have pointed DP somewhere else. Ergo you might have to add some code to set DP as required. Otherwise, you may end up wearing out the reset button.
Similarly, you may need to point DB to where the kernel's data tables and other structures, e.g., buffers and queues, are located. Most often they would be in the same bank as the kernel, which means a simple PHK - PLB sequence is all that is needed. Otherwise, you will have to load the correct value for DB into a 8-bit register, push it and then execute PLB.
Finally, it's useful to define a common interrupt return that all ISRs can use. Here's what I have set up:
That is a lot to digest! I'm still getting used to the whole 16-bit thing. When the processor moves 16 bits of data at a time, I'm not quite sure how to get my FPGA data in/out to play nice.
I'm still getting used to the whole 16-bit thing. When the processor moves 16 bits of data at a time, I'm not quite sure how to get my FPGA data in/out to play nice.
The 16-bit stuff is only internal. Externally it's still 8-bit, so the only real hardware complication is dealing with the way it sneezes out bank addresses, should you choose to utilize them.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
I'm still getting used to the whole 16-bit thing. When the processor moves 16 bits of data at a time, I'm not quite sure how to get my FPGA data in/out to play nice.
The 16-bit stuff is only internal. Externally it's still 8-bit, so the only real hardware complication is dealing with the way it sneezes out bank addresses, should you choose to utilize them.
I haven’t utilized them yet, but “always_latch” in system verilog makes short work of grabbing the bank addresses.
Not really. The concepts are what matter, not the code.
Quote:
I'm still getting used to the whole 16-bit thing. When the processor moves 16 bits of data at a time, I'm not quite sure how to get my FPGA data in/out to play nice.
What Mike said.
Fetches and stores occur eight bits at a time, even when the register doing the fetching and storing is set to 16 bits. The first clock cycle that accesses memory or I/O accesses the least significant byte, which is at the target address. The next clock cycle accesses the most significant byte, which is at the target address +1. Needless to say, if you are accessing a chip register you want the MPU register doing the accessing to be set to 8 bits.
All of this is extensively covered in the Eyes & Lichty programming manual.
All I have is Sam's sketch in the second post of this thread. Once you know the 6502 family well enough, and you're adept enough (Jeff is a good example) you can do this kind of thing.
Note that in the fourth post, Sam (kc5tja) offers the idea that handling COP as an SWI is a fallback for systems which lack a hardware copro.
(Although Sam is still a relatively active hacker, he only very rarely visits the forum.)
All I have is Sam's sketch in the second post of this thread. Once you know the 6502 family well enough, and you're adept enough (Jeff is a good example) you can do this kind of thing.
Note that in the fourth post, Sam (kc5tja) offers the idea that handling COP as an SWI is a fallback for systems which lack a hardware copro.
(Although Sam is still a relatively active hacker, he only very rarely visits the forum.)
Ed,
When he says "presenting a no-op," does he mean sending $EA back to the 65c816? I'm confused, because Jeff then says, "in this thread the term NOP doesn't necessarily imply $EA, the official NOP." Okay... then what does it mean? How do you "feed" the 'c816 no-ops from your external hardware?
The rest of Jeff's sentence should help you. Without wishing to discourage you, it does appear that you need to read more carefully and think more deeply: this is not easy stuff, it requires study and consideration. What you're aiming to do is build up a mental model of the cycle by cycle behaviour of an MPU in its system, so you know what's happening. This isn't going to work if you just jump from one guess to another: the questions are harder than you think.
What's more, you risk running everyone out of patience: there are half a dozen people here with enough goodwill to help you, but unfortunately that's not an unlimited resource.
Here's what Jeff said. Get paper and pencil and work through cycle by cycle what happens when an MPU reads an instruction. This is going to take you at least a couple of hours.
Dr Jefyll wrote:
BTW in case anyone's wondering, in this thread the term NOP doesn't necessarily imply $EA, the official NOP. $EA isn't ideal for the job of manipulating the '816 to read (but ignore) a series of bytes at PC because it only accesses one byte every 2 cycles. A better choice is to feed the '816 WDM ($42) -- a two-byte NOP that executes in 2 cycles.