whartung wrote:
Casually glancing through the threads, it seems that the '816 Forth are basically 16-Bit Forths, with 64K memory, using the enhanced '816 instruction set.
Rather than something different that can leverage the segmented nature of the CPU.
In truth, I haven't quite grokked the nature of the '816. More so than the Intel's segmented architecture, the '816 does seem more like "a bunch of little 6502s" rather than a single CPU with a large memory space.
Maybe I'm misunderstanding, but this:
Code:
01:FFFE NOP
01:FFFF NOP
02:0000 NOP
02:0001 NOP
Were I to start the program running at 01:FFFE, in the end, the PC WILL NOT cross the 64K boundary, but will rather wrap around on the page register. Granted, a single piece of code can access data throughout the memory space, and can jump to arbitrary segments of code. But, in the end, you do, really, have a bunch of 64K instances that are all little islands on the CPU.
I'm neither a Forth user or Forth expert, but I do know one or two things about writing software.
So while I can't comment on using Forth, I can definitely comment on developing a Forth kernel.
Firstly, my pedantic tendencies prompt me to point out that the 65C816 is not "segmented." It is "banked," which is completely different than segmentation
a la the Intel 8086.
As you noted, bank boundaries matter in a running program, as wrapping
PC will not increment
PB. Hence the maximum possible size of a program, assuming it is loaded into any bank other than
$00, is 65,536 bytes, which prompts a question. When was the last time you encountered a 6502 program anywhere near that size? Even in the days when professional grade word processors, database engines and spreadsheets were written for the 6502, few exceeded 15KB in size. The bank boundary imposition on a program is, in my opinion, a non-problem.
Turning to load/store operations, they follow an entirely different set of rules, depending on whether the operation ultimately touches direct page, the hardware stack or elsewhere. Summed up:
- If a load/store access is to direct page or the stack the bank is automatically $00.
- If a load/store access is to an absolute location and is expressed as a 16 bit address ("base" address), the effective address is DB×$010000+BASE+INDEX. For example, if DB contains $01 and .X is loaded with $02, the instruction LDA $8000,X generates an effective address of $018002. If the sum of the base address and .X exceeds $FFFF access will occur at (DB+1)×$010000+BASE+INDEX. Hence it is possible to cross bank boundaries with ordinary 16 bit indexed addressing.
- If a load/store access is to an absolute location and is expressed as a 24 bit address ("base" address), the effective address is BASE+INDEX. For example, if .X is loaded with $02, the instruction LDA $038000,X generates an effective address of $038002. The instruction LDA $02FFFF,X when .X is loaded with $FFFF generates an effective address of $03FFFE.
Additional addressing flexibility is possible via use of the indirect long modes. For example,
LDA [PTR],Y treats
PTR,
PTR+1 and
PTR+2 as a 24 bit direct page pointer from which the target address is gotten. With this addressing mode
DB is ignored, as the data bank is specified in
PTR+2.
Essentially, the 65C816 has a flat addressing model for load/store operations, other than direct page and stack accesses. So, no, it is not a "a bunch of little 6502s."
Quote:
Anyway, given that, what kind of thoughts have there been about a segmented Forth runtime. You could separate the headers and dictionary code, maybe. You could easily create a heap that lives on its own segment, unaffected by the vocabulary. 64K of BLOCK buffers…whee… I'm sure there was some thinking about this for the Intel systems.
Just curious what folks have thought about here.
In any program running on the '816, not just Forth, there is no requirement that the data be anywhere near the program itself. With the 65C02, everything has to fit within the 64KB address space of that MPU. With the '816, that address space is now (potentially) 16,000KB. Hence there is a fundamental difference between writing 65C02 programs and writing 65C816 equivalents. Addressing with the '816 is much more expansive and flexible. Naturally, the enhanced instruction set and availability of 16 bit registers open the door to much different and more efficient algorithms.
Considering Forth, the main dynamic data structures are the dictionary and the data stack. With the '816, the dictionary can be any size you want up to the limits of memory and can start anywhere in address space that is convenient, whether in the same bank in which the program is running or a different one. If the possibility exists of the dictionary growing to the point where it happens to cross a bank boundary you would use 24 bit addressing to access it. The basic methods of indexing are the same as with the 65C02, except an index can be 16 bits in size instead of 8, which greatly simplifies the processing of data structures that cross page boundaries. It's a matter of rethinking your methods, instead of using the traditional 8 bit approach.
Forth on the 6502 has customarily maintained the data stack on page zero. You coul do the same with the '816, except you can also relocate its direct page (page zero) to anywhere within bank
$00 by changing the 16 bit value in
DP (direct page register). For example, if you load
$C200 into
DP a instruction such as
LDA ($00,X) will actually be directed to
$C200 and it will be as though you had written
LDA ($C200,X). The advantage of this feature is you could maintain multiple data stacks, or you could separate the data stack entirely from the physical page zero in the machine and reserve the physical page zero for such things as pointers, indices and the like.
Aside from the above, I don't have anything else specific to offer on Forth on the 65C816. However, Forth is no different than any other program when it comes to system usage. The '816 offers a lot more in that regard and as Garth has mentioned in the past, Forth written natively for the '816 will run substantially faster than it will on a 65C02, even at the same clock speed.