6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon May 13, 2024 12:11 am

All times are UTC




Post new topic Reply to topic  [ 69 posts ]  Go to page Previous  1, 2, 3, 4, 5
Author Message
PostPosted: Sun Mar 06, 2022 4:01 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
BruceRMcF wrote:
bruce's (the other one) RTS as hardware do-next is really cool, but the inability to leave interrupts one would be a deal breaker for me.
Yes it is really cool. And the interrupt problem can be fixed with some extra circuitry -- I posted about that here.

(There's also careful explanation of dclxvi's rather unusual technique.)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 11, 2022 9:22 pm 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
IamRob wrote:
Surely using both the X-reg & Y-reg here wastes a lot of cycles to preserve them and reloaded with pointers or Data stack values when needed in the routines being called?


Note that if roughly 1/4 of all primitive executions are ENTER or EXIT and almost all execute NEXT, clock cycles saved in nesting covers a lot of register preservation in those operations needing them.

However, if it is close to a wash, I would lean toward leaving the return stack index register free and preserving the return stack register value in the zero/direct page, because it is just easier to write the primitives with a free register available. That is straightforward with "INX : INX : JMP ($0000,X)" using Y for the rack.

Code:
 
EXIT:   LDY RNDX
        LDX RS,Y
        INY
        INY
        STY RNDX
NEXT:   INX
        INX
        JMP ($0000,X)

ENTER:  ; JSR ENTER : !word FirstWord, SecondWord, … , EXIT
        LDY RNDX
        DEY
        DEY
        STY RNDX
        STX RS,Y
        PLX
        JMP ($0000,X)

Costs 8 cycles in each, so adds on average 2 clocks to the NEXT / ENTER / EXIT overhead per primitive executed. Similar overhead applies to R@ and crew.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 12, 2022 12:28 am 
Offline

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357
BruceRMcF wrote:
IamRob wrote:
Surely using both the X-reg & Y-reg here wastes a lot of cycles to preserve them and reloaded with pointers or Data stack values when needed in the routines being called?


Note that if roughly 1/4 of all primitive executions are ENTER or EXIT and almost all execute NEXT, clock cycles saved in nesting covers a lot of register preservation in those operations needing them.

However, if it is close to a wash, I would lean toward leaving the return stack index register free and preserving the return stack register value in the zero/direct page, because it is just easier to write the primitives with a free register available. That is straightforward with "INX : INX : JMP ($0000,X)" using Y for the rack.

Code:
 
EXIT:   LDY RNDX
        LDX RS,Y
        INY
        INY
        STY RNDX
NEXT:   INX
        INX
        JMP ($0000,X)

ENTER:  ; JSR ENTER : !word FirstWord, SecondWord, … , EXIT
        LDY RNDX
        DEY
        DEY
        STY RNDX
        STX RS,Y
        PLX
        JMP ($0000,X)

Costs 8 cycles in each, so adds on average 2 clocks to the NEXT / ENTER / EXIT overhead per primitive executed. Similar overhead applies to R@ and crew.

This is only half the picture though. Almost every word also uses the data stack. The Y-reg being the free register here would be the logical choice to use with the data stack but requires 3 byte instructions and a stack pointer for the Y-reg must also be kept. If we try to use the X-reg for the data stack, it is even worse as it must be saved first, load the data stack pointer, then both save the data stack pointer and restore the X-reg. There are a lot of words that require PUSH, PUT, POP, POP2 & POP3.

We won't be saving anything if we are constantly saving and restoring a data stack register, and at the cost of a bit more code. I don't have the full picture yet, and haven't got around to counting cycles yet. But this is definitely worth coming back to.

I am also having trouble following ENTER. The PLX is pulling the return address off the stack from whatever is calling with a JSR ENTER. This doesn't make sense to me if every word definition starts with JSR ENTER.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 12, 2022 10:20 am 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
IamRob wrote:
This is only half the picture though. Almost every word also uses the data stack. The Y-reg being the free register here would be the logical choice to use with the data stack ...


But if the stack is not being used for the return stack or for the IP, it can be used for the data stack.

Code:
AND:    PLA
        AND 1,S
        STA 1,S
        INX
        INX
        JMP ($0000,X)

SWAP:   PLA
        PLY
        PHA
        PHY
        INX
        INX
        JMP ($0000,X)

PLUS:    PLA
        CLC
        ADC 1,S
        STA 1,S
        INX
        INX
        JMP ($0000,X)

FETCH:  PLY
        LDA $0000,Y
        PHA
        INX
        INX
        JMP ($0000,X)

STORE:  PLY
        PLA
        STA $0000,Y
        INX
        INX
        JMP ($0000,X)


In the 65C02 version, the "JMP ($0000,X)" has to be self-modifying code in the zero page, so saving the three clocks of "JMP NEXT" is not an option ... in the 65C02 version, the IP IS the spot in the zero page occupied by the $0000 ... and so using the hardware stack as the data stack while allowing most operations to avoid "STX TX : TSX : ... : LDX TX" benefits from a top-of-stack location in the zero page.

With the stack relative addressing in the 65816, the hardware stack makes a fine data stack on its own.

Of course, there is no such thing as a free lunch ... saving the three clocks of the "JMP NEXT" or "BRA NEXT" also means each primitive ends with a five byte NEXT rather than a two byte branch or three byte jump to NEXT.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 12, 2022 2:52 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA
Dr. Bruce, if you keep TOS in A your examples improve thusly:
Code:
AND:    AND 1,S
        PLY        ; nip
        INX
        INX
        JMP ($0000,X)

SWAP:   PLY
        PHA
        TYA
        INX
        INX
        JMP ($0000,X)

PLUS:   CLC
        ADC 1,S
        PLY        ; nip
        INX
        INX
        JMP ($0000,X)

FETCH:  TAY
        LDA $0000,Y
        INX
        INX
        JMP ($0000,X)

STORE:  TAY
        PLA
        STA $0000,Y
        PLA
        INX
        INX
        JMP ($0000,X)
Store is slightly bigger, and all the others are slightly smaller, for a net win. :)
And if JMP (abs,X) wraps in the program bank, you could even change to JMP ($FFFE,X) to keep your IP pointed in the traditional manner, I think.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 12, 2022 5:51 pm 
Offline

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357
BruceRMcF wrote:
IamRob wrote:
This is only half the picture though. Almost every word also uses the data stack. The Y-reg being the free register here would be the logical choice to use with the data stack ...


But if the stack is not being used for the return stack or for the IP, it can be used for the data stack.

Code:
AND:    PLA
        AND 1,S
        STA 1,S
        INX
        INX
        JMP ($0000,X)

SWAP:   PLA
        PLY
        PHA
        PHY
        INX
        INX
        JMP ($0000,X)

PLUS:    PLA
        CLC
        ADC 1,S
        STA 1,S
        INX
        INX
        JMP ($0000,X)

FETCH:  PLY
        LDA $0000,Y
        PHA
        INX
        INX
        JMP ($0000,X)

STORE:  PLY
        PLA
        STA $0000,Y
        INX
        INX
        JMP ($0000,X)


In the 65C02 version, the "JMP ($0000,X)" has to be self-modifying code in the zero page, so saving the three clocks of "JMP NEXT" is not an option ... in the 65C02 version, the IP IS the spot in the zero page occupied by the $0000 ... and so using the hardware stack as the data stack while allowing most operations to avoid "STX TX : TSX : ... : LDX TX" benefits from a top-of-stack location in the zero page.

With the stack relative addressing in the 65816, the hardware stack makes a fine data stack on its own.

Of course, there is no such thing as a free lunch ... saving the three clocks of the "JMP NEXT" or "BRA NEXT" also means each primitive ends with a five byte NEXT rather than a two byte branch or three byte jump to NEXT.

Very Nice!

Not only is it built for speed, there is still the option to reduce the size of some of the lesser called words and still do a BRA to a word that is used a lot and still has the INX INX JMP ($0000,X), to save 3 bytes each time. And not even have to worry if we are still within the 128 byte limit of a branch.

This might be a really good trade-off between size and speed, and even versatility.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 13, 2022 1:11 am 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
barrym95838 wrote:
Dr. Bruce, if you keep TOS in A your examples improve thusly:
Code:
AND:    AND 1,S
        PLY        ; nip
        INX
        INX
        JMP ($0000,X)


Yes, but optimizing the dyadic and monadic stack operators will come at a cost somewhere else{+}, so given a 65816 system like the Feonix256 or a 65816 board in the nascent Commander X16, it might be something I might like to implement both ways and profile to see which one shakes out the best.

OTOH, it may be that the 65816 equivalent of the TOS in the zero page in the 65C02 is indeed keeping TOS in A -- the fact that JMP ($0000,X) related ENTER and EXIT can be implemented without requiring using of the A register does encourage that thought.

Since my focus is on the 65C02 at present, these 65816 concepts are just sketches, but if I get enough progress on the 65C02 version of xForth this coming summer, I might flesh out the 65816 version.

{+ Note: So far I only see some small clock penalties in ROT BRANCH and the double-cell words between data stack and the Rack, and bigger clock gains for TOS in A with FETCH and DOVAR/DOCON.}


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 14, 2022 7:17 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA
It just seems appealing to me when I see the common words like DUP, DROP and NIP turn into a single machine instruction plus NEXT (PHA, PLA, PLY). In my 65m32 DTC FORTH, literally dozens of primitives reduce to a single machine instruction, making optimized STC the proper performance-oriented solution. I need to stop trying to prematurely optimize everything about the architecture and get something working first, but my obsessive compulsivity combined with a fair amount of laziness refuse to allow it.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 15, 2022 10:53 pm 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
Note that for the two operand words, the 65816 hardware data stack version is shorter, but the X indexed stack for the SRT is faster, and the SRT has a one byte RTS instead of five byte INX : INX : JMP ($0000,X)
Code:
ANDD:   AND DS,X
        INX
        INX
        RTS


But before I try my hand at a subroutine threaded implementation, I want to be sure I can implement one closer to the eForth v1.0 and CamelForth gor the Z80 models I am using for reference.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 69 posts ]  Go to page Previous  1, 2, 3, 4, 5

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: