6502.org • View topic - Highly optimized 65816 ITC Forth

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Forth

All times are UTC

Highly optimized 65816 ITC Forth

Page 4 of 5

[ 69 posts ]

Go to page Previous 1, 2, 3, 4, 5 Next

Previous topic | Next topic

Author

Message

GARTHWILSON

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Feb 27, 2022 5:22 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California

BruceRMcF wrote:

There's also token/bit-threaded. You don't have to reserve all 256 bytes for tokens ... you can reserve 128 or 127 for tokens and have one-byte primitives and two-byte calls.

I'm in a hurry this morning and will have to come back later to look at you post in more detail; but I'll comment on this part now. I have not seriously considered token threading so far; but the way I have envisioned it is to have 255 (not 256, nor 128, etc.) one-byte values assigned to the most-used words, whether primitive or secondary, and a 00 byte (which is tested for automatically when you load it and it affects the Z flag) would mean take the next pair of bytes as an address of a word that doesn't fit in the basic table, again regardless of whether it's a primitive or not. Another possibility is to cut the 255 down to 254 or 253 to leave one or two other options. In the spirit of the topic title, it would be highly optimized for code memory usage; but the reason I have not seriously considered it is the additional performance hit that comes from NEXT having to decide what to do with each byte and often needing to look up an address from a table. IamRob, if this is too far off of what you wanted your topic here to be for, let us know your preference.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

IamRob

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Mon Feb 28, 2022 12:40 am

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

GARTHWILSON wrote:

BruceRMcF wrote:

There's also token/bit-threaded. You don't have to reserve all 256 bytes for tokens ... you can reserve 128 or 127 for tokens and have one-byte primitives and two-byte calls.

Nope. Not too far off at all. I am open to other ideas of optimization as well. Using one-byte tokens is basically what most BASICs already use. I can see having one-byte tokens in a definition might be advantageous for an 8-bit machine, but am leery with 16-bit accesses there might be too much switching the registers back and forth between 8 or 16-bits. With 16-bit, there is going to be 1 extra route of redirection. Right now I can just do a LDY and instantly have the address. With 1-byte tokens it will be SEP #$20 LDY #TOKEN REP #$20, then LDX TABLE,Y to get the address.

Or will the X-Y-registers stay in 8-bit mode all the time? This might be advantageous if the low and high bytes of an address are in separate pages in memory. Hmmmm! I might have to play with this. It could come down to that the X-Y-registers never need to be in 16-bit mode.

Top

BruceRMcF

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Mon Feb 28, 2022 1:26 am

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217

Yes, while the token/bit-threaded sketch was in 65C02 rather than 65816 because by the time I got to it, the Commander X16 had clearly switched to the 65C02 and I didn't hammer out a 65816 version ...

... but it is also true that a 65816 version would be constantly swapping back and forth between eight bit and sixteen bit data mode (though the indexes could stay in 16bit mode).

Versus ITC, a primitive swaps a two byte Code Field for two bytes in a jump vector, but costs a byte to store it's token, while for compiled code the two byte Code Field is just dropped, and each compiled primitive saves a byte ... including the EXIT compiled into every routine. So the space savings should be substantial.

But versus "INX : INX : JMP ($0000,X)" for 65816 DTC, or the self-modifying code short-cutting one level of indirection in your ITC model, it IS a lot of extra overhead in the inner interpreter.

Top

BruceRMcF

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Thu Mar 03, 2022 10:46 pm

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217

BruceRMcF wrote:

... but it is also true that a 65816 version would be constantly swapping back and forth between eight bit and sixteen bit data mode (though the indexes could stay in 16bit mode). ...

For a version located in RAM/ROM above $8000 and with a maximum of 64 primitives (so no need to shift), i think it might be something like:

Code:

        EXIT:   PLY
NEXT:   INY
NEXT0:  SEP #$20
        LDA $0000,Y
        BMI ENTER
        STA OPCODE
        REP #$20
        JMP (OPCODE)
ENTER:  XBA
        INY
        LDA $0000,Y
        PHY
        REP #$20
        TAY
        BRA NEXT0        

With no code field and all primitives one byte, it would tend to be very compact. And it has the advantage of all bit-threaded coded that you do ENTER directly, INSTEAD of the indirect call (in ITC) or direct call (in DTC) ... where in most models you do the call and THAT does the ENTER operation.

But it loses part of the advantage of the 65816 in doing it's opcode fetching a byte at a time, and it add 4 clock cycles overhead in setting then resetting the 8bit/16bit mode flag.

It does rely on an X-indexed data stack, so it gets to use the "LDA (ST,X) : STA ST,X" for @ that the 65816 gives you with an direct page, X-indexed data stack. And you can put ST at $F8 for direct access to the top four cells of the stack relative to X ... ST,X; ST+2,X; ST+4,X and ST+6,X ... and with Forth's efficient use of the return stack, you can leave the initial direct page at page 0 and the stack page at page 1, for convenience, while reserving 248 bytes of the direct page for other uses.

Top

IamRob

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sat Mar 05, 2022 7:05 pm

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

I don't know if anyone is going through the code or using any bits and pieces, but I just noticed something.

If all the BRA NEXT and JMP NEXT are converted to RTS's, and NEXT removed, the whole code can be used to create a DTC Forth.

That would mean, that both an ITC Forth and a DTC Forth can be created from the same basic Forth system. The only difference being the compiler to create word definitions.

Top

barrym95838

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sat Mar 05, 2022 7:27 pm

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA

I don't think DTC can use RTS that way, unless you use the hardware stack pointer for your IP. dclxvi (Bruce) played around with that idea, but it comes with some significant issues that may require hardware assistance, like any stack activity having the potential to nuke your threaded code. STC, on the other hand ...

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)

Top

BruceRMcF

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sat Mar 05, 2022 9:26 pm

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217

I think the tradeoff for using a subroutine call / RTS for DTC is the extra branch in NEXT, so three clocks per word versus one or two bytes saved per primitive.

Code:

EXIT:   LDX RS,Y
        INY
        INY
NEXT:   JSR ($0001,X)
        INX
        INX
        BRA NEXT

ENTER:  ; JSR ENTER : !byte FirstWord,NextWord,...,EXIT
        DEY
        DEY
        STX RS,Y
        PLX
        JSR ($0001,X)
        INX
        INX
        BRA NEXT

Top

IamRob

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 12:48 am

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

barrym95838 wrote:

There really is no difference compared to what ITC does. And the stack getting nuked can just as easily happen under ITC.

With some minor changes to a few words, DOCOL would not be needed and instead of changing all the BRA/JMP NEXT to RTS, we can just put one simple RTS right at NEXT for now and follow the code through.

Under ITC, the return address is pushed at the beginning of each word, when the pointer in the CFA points to DOCOL and ;S pulls that address off the stack. DTC pushes the return address from the JSR and RTS pulls it off.

A DO..LOOP works exactly the same pushing the start and end values of the loop on the stack. Both ITC and DTC need to make sure the stack is returned to the point before the values were pushed.

That pretty much goes for any return stack usage. Anything that gets pushed on must be accounted for and removed before returning to a previous calling routine no matter which Threaded Forth one is using. Believe me, I have had some pretty fantastic crashes under ITC Forth as well.

The levels deep some words enter should be exactly the same under both ITC and DTC as both push exactly 2 bytes when calling other words.

At the moment, I don't see any difference in the return stack usage between ITC and DTC. They both should use exactly the same amount of stack space.

Top

IamRob

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 12:54 am

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

Quote:

I think the tradeoff for using a subroutine call / RTS for DTC is the extra branch in NEXT, so three clocks per word versus one or two bytes saved per primitive.

Code:

Code:
EXIT:   LDX RS,Y
        INY
        INY
NEXT:   JSR ($0001,X)
        INX
        INX
        BRA NEXT

ENTER:  ; JSR ENTER : !byte FirstWord,NextWord,...,EXIT
        DEY
        DEY
        STX RS,Y
        PLX
        JSR ($0001,X)
        INX
        INX
        BRA NEXT

Surely using both the X-reg & Y-reg here wastes a lot of cycles to preserve them and reloaded with pointers or Data stack values when needed in the routines being called?

I am also having trouble seeing where the last word of a definition, which is to EXIT (correct?), pulls the address off of the JSR at NEXT?

Shouldn't this be:

Code:

EXIT  PLX        ; pulls the 2-byte call to EXIT from the JSR at NEXT
       LDX RS,Y
       INY
       INY

Last edited by IamRob on Sun Mar 06, 2022 1:56 am, edited 2 times in total.

Top

Dr Jefyll

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 1:39 am

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada

IamRob wrote:

the stack getting nuked can just as easily happen under ITC.

It's true that an interrupt will always cause writes to memory at and below the address pointed to by the Stack Pointer. And in most situations that's harmless, because S is being used to implement a LIFO, and it'll point into a RAM area reserved for that purpose. As long as the interrupt leaves S as it found it, and never writes above S, you're good.

But with Bruce's (dclxvi's) rather radical proposal :shock:

, S is not used to implement a LIFO. Instead, it points into the list of tokens being executed. Writes to memory at and below S are not OK because they'll corrupt the list.

dclxvi wrote:

In this DTC implementation, the S register is the is Forth's IP register, and RTS is NEXT (the inner interpreter).

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html

Top

IamRob

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 1:52 am

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

Dr Jefyll wrote:

IamRob wrote:

the stack getting nuked can just as easily happen under ITC.

, S is not used to implement a LIFO. Instead, it points into the list of tokens being executed. Writes to memory at and below S are not OK because they'll corrupt the list.

If I understand Bruce's code, it also doesn't have RTS at the end of any of his definitions. His method uses 2-bytes (instead of 3 if inline JSR is used) of a Direct address instead of an Indirect address.

dclxvi wrote:

In this DTC implementation, the S register is the is Forth's IP register, and RTS is NEXT (the inner interpreter).

-- Jeff[/quote]
Yep, pretty clever. But you brought Bruce's rather unorthidox, radical, shocking, proposal into the equation. That just, well, complicates things.

Top

Dr Jefyll

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 2:53 am

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada

IamRob wrote:

Yep, pretty clever. But you brought Bruce's rather unorthidox, radical, shocking, proposal into the equation. That just, well, complicates things.

LOL! Yeah, it is a pretty unorthodox, radical and shocking proposal.

But it was brought into the equation by Mike Barry (just a few posts ago, here), not by me. You saw fit to respond, so I figured it was OK to respond in turn, suspecting you might've misunderstood Mike's point.

Quote:

If I understand Bruce's code, it also doesn't have RTS at the end of any of his definitions.

If you're interested at all then you owe it to yourself to have a closer look. "RTS is NEXT (the inner interpreter)." But if all of this is OT then we can drop it and move on. It perhaps wasn't helpful of me to try to clarify.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html

Top

IamRob

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 4:47 am

Joined: Sun Apr 26, 2020 3:08 am
Posts: 357

Dr Jefyll wrote:

IamRob wrote:

Yep, pretty clever. But you brought Bruce's rather unorthidox, radical, shocking, proposal into the equation. That just, well, complicates things.

LOL! Yeah, it is a pretty unorthodox, radical and shocking proposal.

Quote:

If I understand Bruce's code, it also doesn't have RTS at the end of any of his definitions.

The current topic is perfectly fine. It makes me think outside the box. It might take a day or two to fully understand the new direction.

With this new equation in the mix, if anything, now more than anything, I yearn for a cpu that has 2 Accumulators, 2 X-regs & 2 Y-regs. Is that too much to ask for?

Top

barrym95838

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 5:42 am

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA

I was just free-associating on Rob's "idea" of using RTS for a DTC NEXT ... at least that's what I thought he was describing ... maybe I didn't accurately read between the lines.

Quote:

With this new equation in the mix, if anything, now more than anything, I yearn for a cpu that has 2 Accumulators, 2 X-regs & 2 Y-regs. Is that too much to ask for?

The 6309 pretty much has that covered, but it's big-endian, so :evil:

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)

Top

BruceRMcF

Post subject: Re: Highly optimized 65816 ITC Forth

Posted: Sun Mar 06, 2022 12:21 pm

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217

IamRob wrote:

Surely using both the X-reg & Y-reg here wastes a lot of cycles to preserve them and reloaded with pointers or Data stack values when needed in the routines being called?

Certainly, but there are also a number of primitives that can get by with just the indexed stack operations. That's something I would look at by implementing it either way and checking out which is faster in use rather than by cycle counting.

Altogether, if space is THAT critical that I am counting the difference between JMP NEXT and RTS in primitives, I would look at biting the bullet and seeing if a token/bit-threaded inner interpreter is "fast enough", and just go with that if it is.

Quote:

I am also having trouble seeing where the last word of a definition, which is to EXIT (correct?), pulls the address off of the JSR at NEXT?

Shouldn't this be:

Code:

EXIT  PLX        ; pulls the 2-byte call to EXIT from the JSR at NEXT
       LDX RS,Y
       INY
       INY

It should indeed, I had that wrong in the sketch above.

bruce's (the other one) RTS as hardware do-next is really cool, but the inability to leave interrupts one would be a deal breaker for me. I always want to leave the option of having a watchdog timer to interrupt an endless loop or stall.

Top

Page 4 of 5

[ 69 posts ]

Go to page Previous 1, 2, 3, 4, 5 Next

Board index » 6502.org Users Forum » Forth

All times are UTC

Who is online

Users browsing this forum: No registered users and 4 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum