ehBasic on 65816
My new code failed. However, I think I know why. I was trying to relocate "zero page" by setting the Direct register to $0300. However, there are several jump instructions that jump into self-modifying code in zero page. These JMP instructions do not use the Direct register, they are absolute addresses.
My main conflict is with the zero page locations used by both Monitor and EhBASIC. Since I am using the first 30 bytes for I/O, this limits the free space. I was hoping to get EhBASIC to use page $0300 for its "zero page".
I'll see how many JMP's there are and try to find a work around.
Daryl
My main conflict is with the zero page locations used by both Monitor and EhBASIC. Since I am using the first 30 bytes for I/O, this limits the free space. I was hoping to get EhBASIC to use page $0300 for its "zero page".
I'll see how many JMP's there are and try to find a work around.
Daryl
Here is perhaps one idea -- if you JMP into zero-page instead of JSR into it, you might want to put direct-page immediately in front of your EhBASIC RAM or ROM image. E.g., let BASIC's direct page sit precisely 256 bytes in front of EhBASIC itself.
Then, you can use the 65816's PC-relative "BRL" instruction to invoke the routine. That way, EhBASIC can be relocated anywhere in bank 0 memory, without having to worry about where precisely it's loaded.
It is a REAL pity that the 65816 lacks greater support for late-binding in software. In fact, PC-relative branches should have been the norm (and not reserved just for conditional branches) from day one back when the 6502 was introduced. Better support for indirection would be nice too.
The 6809 has the 65xx architecture beat hands down in this area, for sure.
Then, you can use the 65816's PC-relative "BRL" instruction to invoke the routine. That way, EhBASIC can be relocated anywhere in bank 0 memory, without having to worry about where precisely it's loaded.
It is a REAL pity that the 65816 lacks greater support for late-binding in software. In fact, PC-relative branches should have been the norm (and not reserved just for conditional branches) from day one back when the 6502 was introduced. Better support for indirection would be nice too.
The 6809 has the 65xx architecture beat hands down in this area, for sure.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
The '816 does have a branch-relative-long (BRL) and more indexed modes too, but also the PER instruction. For the 6502, figuring out an address relative to where the program counter is at the time is a complex, inefficient process; but the 65816's PER instruction adds the operand to the address of the next instruction (regardless of where you started loading the program), and pushes the result on the stack. From there, you can use it to get to data or program addresses (like with a simulated JSR-relative) that might be different each time you run the program. Stack-relative addressing further expands the possibilities.
Sadly, it's no substitute for a genuine PC-relative JSR or a JSR with indirect indexed addressing. PER takes a butt-load of time to run (almost as much as a JSR itself). Consider the overhead of this snip of code:
The PEA takes a minimum of 5 cycles alone, PER another 5 cycles, and another 6 for the RTS. Ouch -- that's 16 cycles to CALL the subroutine in question. Add another 6 for the subroutine's RTS, for a minimum overhead of 22 cycles.
Similar latencies exist for indirect subroutine calls too (assuming the vector sits in bank 0; longer still if not!)
What's infuriating to me is that I know the 65816/6502 are architecturally fast enough to go much faster; we know this because absolute-indexed addressing modes are almost as fast as pure absolute modes! Grrrr....
Code: Select all
.macro JSP ; Jump Subroutine PC-relative absolute
pea *+7
per \0-1
rts
.endmacro
Similar latencies exist for indirect subroutine calls too (assuming the vector sits in bank 0; longer still if not!)
What's infuriating to me is that I know the 65816/6502 are architecturally fast enough to go much faster; we know this because absolute-indexed addressing modes are almost as fast as pure absolute modes! Grrrr....
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
that's 16 cycles to CALL the subroutine
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
kc5tja wrote:
Sadly, it's no substitute for a genuine PC-relative JSR or a JSR with indirect indexed addressing. PER takes a butt-load of time to run (almost as much as a JSR itself). Consider the overhead of this snip of code:
The PEA takes a minimum of 5 cycles alone, PER another 5 cycles, and another 6 for the RTS. Ouch -- that's 16 cycles to CALL the subroutine in question. Add another 6 for the subroutine's RTS, for a minimum overhead of 22 cycles.
Code: Select all
.macro JSP ; Jump Subroutine PC-relative absolute
pea *+7
per \0-1
rts
.endmacro
Code: Select all
.macro BSR ; Branch to Subroutine
per *+6
brl \0
.endmacro
I hadn't thought of that; I'm not sure why. Good call. Though, I'm still not happy with those extra four cycles. Subroutine performance on the 65816 is bad enough as it is; with a minimum overhead of 12 cycles (6 for JSR, 6 for the corresponding RTS), it's no wonder people shied away from well-factored software over the years.
The problem wasn't as simple as replacing the JMP's and JSR's.
In addition to those, there were several "STA zppointer,y" types that do not have a zp,y addressing mode. The assembler converted them to abs, y. I had to fix those along with the Immediate modes that loaded the upper byte of a zp address, i.e., LDA #>zpptr became LDA #>ZeroPG where ZeroPG was equated to $0300.
After several hours of picking out the absolute references to addresses in $00xx, and fixing the immediates and a few other places that had assumed the upper address byte was 0, I was able to get most of it to work. However, I finally chose to abandon this effort. I would literally have to read every line of code to figure out where the rest of the bugs lie.
I have reworked the zero page labels to where EhBASIC and my Monitor all fit without overstepping eachother. Should have done that first!
I added an EhBASIC command, SYS, to simplify jumps from EhBASIC to the SBC-3 Monitor. I can now cold and warm start EhBASIC. The load and save still need a few tweeks, but that should be easily solved.
I have learned much from this effort and the discussions on this thread that I hope to apply to future projects.
Daryl
In addition to those, there were several "STA zppointer,y" types that do not have a zp,y addressing mode. The assembler converted them to abs, y. I had to fix those along with the Immediate modes that loaded the upper byte of a zp address, i.e., LDA #>zpptr became LDA #>ZeroPG where ZeroPG was equated to $0300.
After several hours of picking out the absolute references to addresses in $00xx, and fixing the immediates and a few other places that had assumed the upper address byte was 0, I was able to get most of it to work. However, I finally chose to abandon this effort. I would literally have to read every line of code to figure out where the rest of the bugs lie.
I have reworked the zero page labels to where EhBASIC and my Monitor all fit without overstepping eachother. Should have done that first!
I added an EhBASIC command, SYS, to simplify jumps from EhBASIC to the SBC-3 Monitor. I can now cold and warm start EhBASIC. The load and save still need a few tweeks, but that should be easily solved.
I have learned much from this effort and the discussions on this thread that I hope to apply to future projects.
Daryl
kc5tja wrote:
I hadn't thought of that; I'm not sure why. Good call. Though, I'm still not happy with those extra four cycles. Subroutine performance on the 65816 is bad enough as it is; with a minimum overhead of 12 cycles (6 for JSR, 6 for the corresponding RTS), it's no wonder people shied away from well-factored software over the years.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
otherwise I end up having reps and seps all over the place
Code: Select all
ACCUM_16: MACRO
REP #00100000B
ENDM
;-------------
ACCUM_8: MACRO
SEP #00100000B
ENDM
;-------------
INDEX_16: MACRO
REP #00010000B
ENDM
;-------------
INDEX_8: MACRO
SEP #00010000B
ENDM
;-------------
I now have the load and save working. I thought all was good.
However, after a few test programs, have discovered the floating point is all messed up. 4/2 does not return 2 and 4^2 does not return 16.
I made the same memory mods to EhBASIC in Michal Kowalski's simulator and it all ran correctly.
There must be some 6502 commands running differently in the 65816 native mode that I have overlooked.
The only thing I didn't add is the fixes for the TXS commands. But I'm sure those are correct.
The first version I posted earlier for download, also has the FP errors.
Lee, if you are reading, any thoughts????
Daryl
However, after a few test programs, have discovered the floating point is all messed up. 4/2 does not return 2 and 4^2 does not return 16.
I made the same memory mods to EhBASIC in Michal Kowalski's simulator and it all ran correctly.
There must be some 6502 commands running differently in the 65816 native mode that I have overlooked.
The only thing I didn't add is the fixes for the TXS commands. But I'm sure those are correct.
The first version I posted earlier for download, also has the FP errors.
Lee, if you are reading, any thoughts????
Daryl
I took the problem one step further and set the EhBASIC code to run in emulation mode. Any calls to the system (input, output, load, save) first switch back to native mode. Upon completion, these routines switch emulation mode back on.
The FP issue is corrected. Now, a FOR/NEXT loop will only execute the first pass and then locks up.
This is getting a little frustrating. I'm going to take a step back from this for a while and go work on the SPI interface.
The FP issue is corrected. Now, a FOR/NEXT loop will only execute the first pass and then locks up.
This is getting a little frustrating. I'm going to take a step back from this for a while and go work on the SPI interface.