6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 19, 2024 7:20 am

All times are UTC




Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: 24-bit CFA ?
PostPosted: Fri Nov 28, 2014 12:06 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 449
Location: Canada
I was just wondering if Forth had been implemented with 24 bit code addresses (24 bit words?) such as are available on the '816.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Fri Nov 28, 2014 3:36 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8422
Location: Southern California
I learned Forth on my HP-71 with 20-bit addresses and cells. I'm sure just about everything has been done.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Fri Nov 28, 2014 5:21 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8138
Location: Midwestern USA
Rob Finch wrote:
I was just wondering if Forth had been implemented with 24 bit code addresses (24 bit words?) such as are available on the '816.

32 bit code is quite a bit more efficient on the 65C816 than 24 bit when working with integers. If integers are processed as words rather than bytes the accumulator can be left in "wide" mode almost all the time, which actually makes the code execute faster. Otherwise, the accumulator has to be changed to "narrow" mode to process bits 16-23 of a 24 bit number. The constant REPs and SEPs in a number processing loop can start gobbling up clock cycles at a rapid rate.

It should be noted that all implied register operations, e.g., INC A or DEX, consume the same number of clock cycles regardless of register width. This because the 65C816's ALU always processes words, regardless of the actual register width. Also, there's only a one cycle penalty for using instructions that involve 16 bit memory accesses, the extra cycle being expended on the MSB fetch and/or store step in the instruction. The performance improvement is especially dramatic on R-M-W instructions: INC SOMEWHERE when the accumulator is set to 16 bits executes about 250 percent faster than INC SOMEWHERE -- BNE NEXT -- INC SOMEWHERE+1 with the accumulator set to 8 bits, plus the code is more succinct.

When I rewrote the firmware in POC V1 to use 16 bit operations I saw a noticeable improvement in performance, and the code size actually shrunk about 15 percent overall. The reclaimed ROM space didn't go to waste: it got used for the SCSI drivers. :lol:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Fri Nov 28, 2014 5:45 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1922
Location: Sacramento, CA, USA
I am not a proficient '816 assembly programmer like BDD, but I almost immediately thought of the same issue. The space saved by using 24-bit addresses vs. 32-bit addresses with the most-significant 8-bits as "don't cares" (or some possibly useful metadata :idea: ) will be offset by the annoying mode-juggling. In fact, I personally find the whole mode thing to be rather annoying in general, and have even seriously considered dropping the "D" flag from my 65m32 implementation ... it's a lot of baggage to carry, especially if done in a conceptually complete manner (re: flag effects), and I would rather do something not at all rather than "half-fast". :roll: On my m-824 (not publicly introduced), it would be much more natural to use 24-bits, because 8 and 24 are its two native width choices.

Mike


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Fri Nov 28, 2014 7:02 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 449
Location: Canada
So I guess that 32 bit addressing is the way to go for a larger code space. I was thinking of the zero page addressing mode [zp],y which uses 24 bit addresses, but I guess a fourth byte could be added that always zero. It wastes a byte of zero page to implement the 32 bit addressing.
A 32 bit CFA addressing on the '816 in Forth would work like 16 bit addressing then.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Fri Nov 28, 2014 7:45 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8422
Location: Southern California
I don't think direct-page addressing space on the '816 for Forth would be a problem, since the heaviest application I could come up with on the '02 took less than 20% of ZP for the data stack, and if you really wanted multitasking, the '816 can easily have a different direct page for each task. It may seem a waste to zero the 4th byte of each 32-bit cell, but many cells will be data anyway and you may want anywhere from 8 to 32 bits. In 16-bit Forth, we still waste half a cell every time we put a character on the stack. The 65Org32 will handle it much more efficiently-- not in terms of saving memory, but there will be no direct-page limitations, or limitations of hardware stack space, and 32-bit quantities get fetched in one cycle, stored in one cycle, and operated on in one cycle, all at once instead of 8 or even 16 bits at a a time.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Fri Nov 28, 2014 3:39 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3343
Location: Ontario, Canada
Rob Finch wrote:
So I guess that 32 bit addressing is the way to go for a larger code space.
32 is a lot. If I were doing it I'd use only 16 bits actually stored in memory, but with an implied scale. After the 16 bits get fetched, shift 'em left and stick some zeros on the right. If we shift 4 places, say, then we've multiplied the code space by 16 (now 1 MB), at the cost of requiring every entry point to be aligned to a 16-byte boundary. Minor downside: that results in about 8 bytes of wasted code space at the end of every routine.

I'm no fan of wasting RAM but the loss is largely reversed because tokens are 16-bit, not 32. And 16-bit tokens are faster to fetch, so the performance implications are significant. :!:

It was arbitrary to suggest four as the number of shifts -- and it makes my scheme reminiscent of x86 real-mode addressing. But I'm not advocating segmentation (and the necessary use of two registers); instead there'd be one, 32-bit IP register inside the core. The wrinkle is that, when it loads from memory, the ms 12 bits and the ls 4 bits of the IP register load as zeros.

Four was an arbitrary choice, and other shift values bear consideration. Some scenarios would benefit best from a one-bit shift, or perhaps three. So that's a matter for debate. Or...

The necessity to define the shift (to have it "cast in stone") can be avoided if the shift is determined by a configuration register. (Yes a creeping feature but it needn't be over the top. Maybe allow only two options, say 0- or 4-bit shifting, if the core size is constrained. A fancier core would have a fancier configuration register, and offer, say, 0-, 2-, 4- or 6-bit shifting... ) Anyway I'd hesitate to blow the token size up to 32 bits.

Obviously the Forth compiler -- assuming the user will want one -- needs alterations to deal with this scheme. That's doable, as I know from making similar alterations. At issue was the compiler for another hybrid Forth with 32-bit cells & operations but 16-bit ITC tokens. The tokens aren't scaled, but you do have to add a fixed offset to find the start of the 64K dictionary within the flat 32-bit address space. (Actually it's a flat 20-bit address space, improbably accomplished with real-mode x86 under DOS. :shock: )

-- Jeff
Edit: raise the compiler issue, last paragraph

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Sat Nov 29, 2014 12:11 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Sat Nov 29, 2014 2:21 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 449
Location: Canada
Quote:
If I were doing it I'd use only 16 bits actually stored in memory, but with an implied scale. After the 16 bits get fetched, shift 'em left and stick some zeros on the right. If we shift 4 places

I like it. That's a pretty good option for extending the code range without going to 32 bit addresses.

I'm toying with the idea of a Forth accelerator peripheral, rather than modifying the cpu. The first thing to do would be to implement the NEXT routine with hardware.
The peripheral would sit in zero page memory and automatically update a 3 byte IP pointer. It would also automagically fetch the code address vector for W. It could then leave this vector scaled (as suggested) in another zero page location. Then all that needs to be done is a jump to the W vector.

Code for a next routine would look something like this then:
Code:
STZ ForthAccelTrigger ; triggers an DMA operation
JMP ForthAccelW


Not quite as slick a solution as couple of the other 6502 modifications for Forth, but I think it could be made to work.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Tue Oct 06, 2015 4:25 pm 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 449
Location: Canada
Quote:
32 is a lot. If I were doing it I'd use only 16 bits actually stored in memory, but with an implied scale. After the 16 bits get fetched, shift 'em left and stick some zeros on the right. If we shift 4 places, say, then we've multiplied the code space by 16 (now 1 MB), at the cost of requiring every entry point to be aligned to a 16-byte boundary. Minor downside: that results in about 8 bytes of wasted code space at the end of every routine.
Memory is cheap. Shifting the CFA by eight places would place code every 256 bytes, and waste 16MB of memory. But in a system with 128MB maybe that isn't a big issue. Using an eight bit shift eliminates shifting from the NEXT routine. The Forth code could be stored in a compressed file format.
Quote:
but you do have to add a fixed offset to find the start of the 64K dictionary within the flat 32-bit address space.

This would have to be added to the CFA. It might be possible to code this just as a constant addition if the address of the dictionary was fixed. Also using 256 byte areas the least significant byte could always be zero and left out of the addition if the dictionary is aligned at a 64k bank.

Next routine would look something like:
Code:
    LDD    FAR (IP),Y          ; Y is used as the low order 16 bits of IP
    LEAY  2,Y
    BNE    .0001
    INC    IP+1                   ;IP+2,IP+3=00 64k bank aligned
.0001:
    ADDD #DictionaryBase  ; dictionary must be in low 16MB
    STD    W+2                  ; set bits [23:8]
W  JMP FAR $00000000

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Wed Oct 07, 2015 2:46 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3343
Location: Ontario, Canada
Hi Rob. OK, this wasn't making sense to me at first but now it does. Earlier in the thread I proposed left-shifting four places, and you've increased that to eight because an 8-bit shift in a byte-organized machine can happen for free just by juggling which bytes get accessed. Eight is a bigger number than we might otherwise choose, but it's not that big a deal. We end up with a 16MB dictionary, with every CFA aligned on a 256-byte boundary. (As for compressed file format, that'd be unnecessary if it's source code that's being stored.)

Rob Finch wrote:
Quote:
but you do have to add a fixed offset to find the start of the 64K dictionary within the flat 32-bit address space.

This would have to be added to the CFA. It might be possible to code this just as a constant addition if the address of the dictionary was fixed. Also using 256 byte areas the least significant byte could always be zero and left out of the addition if the dictionary is aligned at a 64k bank.
Good ideas, but probably unnecessary. My remark about adding a fixed offset pertains to the DOS environment where my dictionary is only 64K and I'm not allowed to map it to the bottom of the 1-MB Real-Mode space because DOS already has stuff down there. RFT6809 Forth has far more freedom, mostly because of the far larger dictionary. It could reside at the bottom of the 4GB space -- sacrificing a portion at the bottom if necessary, which is easily affordable since 16MB is oodles. So, no constant addition required.

Code:
    LDD    FAR (IP),Y          ; Y is used as the low order 16 bits of IP
And IP is a four-byte indirect pointer residing in memory, right? I like how your...
Code:
W  JMP FAR $00000000
takes a four-byte operand, and it's the middle two bytes that get written to by the preceding STD instruction. This is your free 8-bit shift. :)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Wed Oct 07, 2015 1:18 pm 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 449
Location: Canada
Quote:
Code:
    LDD    FAR (IP),Y          ; Y is used as the low order 16 bits of IP

And IP is a four-byte indirect pointer residing in memory, right? I like how your...

Yes, IP is a four byte indirect pointer. The two LSB's are zero and Y contains the low order two bytes of the IP.

Quote:
Code:
W  JMP FAR $00000000

takes a four-byte operand, and it's the middle two bytes that get written to by the preceding STD instruction. This is your free 8-bit shift. :)
That is correct. JMP FAR is a single byte opcode extension to the 6809.
It might be better to use the second 16MB bank in order to avoid zero page memory and define the W vector like:
Code:
W    JMP FAR $01000000


I occurs to me this is really a 6809 topic not a 6502 one. I'll move it over to anycpu,org

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Mon Jan 18, 2016 9:31 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
About Forth on the 65816 - the problem would seem to me is that Forth expects an address as well as any non-double number to fit in a "cell" (see http://www.forth200x.org/documents/html/port.html) so that instructions such as ! (store) and @ (fetch) work. Stuff to do with the Dictionary and XTs would not be be a problem because you could just use a token-based system with a 16-bit offset as the XT as described above. But you need to be able to do fetch and store from the command line as well, which would mean 24-bit numbers for address, which don't fit in a 16-bit accumulator. Argh.

So it would seem that with the 65816 we can either limit our address space to 64k in the current bank (which would be a shame) or do what we did with the 6502, make the cell size twice the accumulator width (16-bit for the 6502, and 32-bit for the 65816), which would be a waste and take us back to the horrible two-step addition process I had so dearly hoped to get away from. Or am I missing something?


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Mon Jan 18, 2016 11:16 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3343
Location: Ontario, Canada
Quote:
the horrible two-step addition process I had so dearly hoped to get away from

I agree it would be a shame to limit our address space to 64k in the current bank. Back in the 20th century I extended my 16-bit Forth to augment it with "far" versions of @ ! C@ C! CMOVE and so on, and these words accepted each address as a double -- ie, two cells on stack. It worked but I quickly discovered it was a breeding ground for bugs, due to the mixture of single- and double-precision values on stack (eg: SWAP becomes ROT or possibly DSWAP) and the need for single- and double-precision operators (eg: + vs D+). It was hard to write and just as bad to read!! In fact, coding this way was so onerous I found myself reluctant to begin a project unless the data fit within 64K or could easily be hacked into pieces. It clamped a lid on productivity & creativity -- and that is what I dearly hope to get away from. :!:

If I ever write another 65xx Forth, every on-stack cell will have 32 bits of storage available (possibly implemented under the hood as separate hi-word & lo-word stacks), and double precision operations will be the default (with strictly optional implementation of single-precision substitutes which just ignore the hi-words of the cells). IOW I will willingly sacrifice execution speed for clean use of the 24-bit data space.

And, percentage-wise, the drop in execution speed won't be as great as one might suppose. That's because the code space (the dictionary space and pointers within it) will remain 16-bit. The considerable number of clock cycles spent on NEXT 0BRANCH NEST UNNEST etc won't increase. The time spent processing data may double, but that doesn't mean the Forth VM as a whole will execute at half speed.

Needless to say, priorities vary, and my solution may not suit everyone. But I'll happily accept less than bleeding edge performance if the upside is de-PITA-ing access to the 816's large, flat data space. The double-precision operations are low-level stuff, written and debugged once. I'm much more concerned about high-level coding, where the write/debug cycle repeats with each new project.

I expressed some similar thoughts, perhaps less succinctly, in a thread I started here.

Cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Wed Jan 20, 2016 6:35 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Thanks for the link to the other thread, I hadn't seen that one. Obviously, it would seem that I'm going to have write two Forths for the 65816 - one "fast and small" (16-bit cells, limited to 64k) and one "big and slow" (32-bit cells, but whole memory). The horror :D !


Top
 Profile  
Reply with quote  
 Post subject: Re: 24-bit CFA ?
PostPosted: Wed Jan 20, 2016 8:15 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3343
Location: Ontario, Canada
scotws wrote:
it would seem that I'm going to have write two Forths for the 65816

Well, there's a compromise solution that comes pretty close to letting you have your cake and eat it too. The key point is to make sure your small & fast Forth reserves space for the high-word stack -- the "ghost stack," as Bruce calls it. (The R stack should have space to get ghosted, too.)

Then you can write a complete set of fast, 16-bit words that'll get used by default, and selectively add 32-bit words as needed (the reverse of what I proposed). Either way, you avoid the #1 thing you really, really DON'T wanna do, and that is be forced to use two cells to hold an extended-precision value. Juggling double-cell items on stack, especially when mingled with single items, is what I learned to hate. :cry:


BTW & FWIW, in 1994 I rewrote a Forth that runs on DOS to use 32-bit operations and 32-bit cells. But the dictionary and compiler remain 16-bit, so it's a hybrid just like the '816 Forth I proposed.

The main thing is, @ ! and CMOVE etc "just work" in the large, flat address space (1 MB in this case -- Real Mode x86 -- or 16 MB with my KK or an '816), so you're freed from requiring a 64K-centric context in your thinking. I found it to be a real breath of fresh air. :mrgreen:

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: