Page 10 of 15
Re: 65VM02
Posted: Thu May 25, 2017 8:29 pm
by GaBuZoMeu
Using JVM with A having an odd contents could become a GTF (aka GoToForest). But JVM could do an implicit A and $FE to avoid this.
By combining Page*$100 + A and then shift this one bit left you still work with 8 bit quantities.
Re: 65VM02
Posted: Fri May 26, 2017 12:05 am
by barrym95838
My friend used to say that I must have executed an "ITW" instruction, for "Into The Weeds".
Mike B.
Re: 65VM02
Posted: Fri May 26, 2017 1:28 am
by GaBuZoMeu
GTF, ITW, and RPC (Randomize PC) are common hidden features of all sort of programming gear me thinks.
edit(1): there is a relative variant too: BH (Branch and Hang)
Re: 65VM02
Posted: Fri May 26, 2017 4:41 am
by Hugh Aguilar
The D-register could span from A9..A16 instead of A8..A15. Increment D then means D := D+512. This would extend the direct pages even into the alternate bank, doubles the number of possible separated tasks - I don't speak Forth, so no idea whether this could be of any use.
The D register tells us where the direct-page and return-stack are.
Each task has its own direct-page and return-stack and its own value for D. I don't envision any application having more than maybe four tasks. So, D is plenty big already.
Thanks for the idea though! I don't think it really works for the 65VM02, but still, I want to hear everybody's ideas.
The 65c816 has a DP register that is similar to my D registers except that it is 16-bit. I have heard (on this forum) about Forth systems that use DP as a data-stack. This is another interesting idea. I'm somewhat dubious of this because normally you want to have pointers and I/O ports in zero-page, and with this scheme you would have to set DP to 0 to access them, which would be awkward.
Re: 65VM02
Posted: Fri May 26, 2017 5:10 am
by BigDumbDinosaur
...normally you want to have pointers and I/O ports in zero-page, and with this scheme you would have to set DP to 0 to access them, which would be awkward.
There's little incentive to make I/O hardware appear in zero page, especially if running with a high clock rate. On the face of it, it would seem that I/O access would benefit from direct page addressing. However, in most device drivers, reads and writes on the hardware are relatively few in number compared to other operations that are going on, such as manipulating the data structures that are handling data from I/O devices. A direct page load or store, on average, takes one less clock cycle than an absolute load or store. Hence the real performance gain in the context of direct page hardware access tends to be quite small, and can all but vanish if wait-states are required.
Using valuable direct page addresses on I/O hardware means fewer addresses are available for data structures that can really benefit from the quicker direct page addressing modes, such as buffer pointers, flags and math accumulators. I've worked with 65xx hardware for over 40 years and cannot recall encountering anything in which the I/O hardware was mapped in at direct page.
Re: 65VM02
Posted: Fri May 26, 2017 10:57 am
by GaBuZoMeu
The D register tells us where the direct-page and return-stack are.
Each task has its own direct-page and return-stack and its own value for D. I don't envision any application having more than maybe four tasks. So, D is plenty big already.
D is still one byte. It's like the JVM where you need A to be even (LSB = 0). D is the high byte only base address for direct and return stack addresses. By shifting D one bit left during usage your CPU won't loose any single clock.
4 tasks:
It is not long ago that a customer asked us to built a sort of multi channel interface. So we built around a small ARM device this IF. It receives enveloped data from the host via USB and dispatches it according to the envelopes to 4 serial ports (2x RS232, 1x RS422, 1x RS485), one CAN bus, 3x I²C, and 3x SPI. Each of these ports will eventually respond to the transmission. These responses has to be enveloped and enqueued into an upstream back to the host.
We decided to use a small preemptive multitasking system (cooperative). There is a separate interrupt service routine for each port and a corresponding task with own IO buffer and mailbox. All sort of various protocols, restrictions, and timing demands are handled by each task - this is very pleasant to write and easier to verify. A separate monitor task (high priority with another serial IO for its own) was used to sniff here and there, to inject additional packages, some of them willingly wrong so we could check error management during load...
This took 12 tasks (11 IO plus USB) for the job plus two additional (monitor and some aux) for maintenance. And around 25 KB for buffers
The 65c816 has a DP register that is similar to my D registers except that it is 16-bit. I have heard (on this forum) about Forth systems that use DP as a data-stack. This is another interesting idea. I'm somewhat dubious of this because normally you want to have pointers and I/O ports in zero-page, and with this scheme you would have to set DP to 0 to access them, which would be awkward.
As far as I understand the way Forth works, neither return stack nor data stack requires to be huge. So they should fit into 512 bytes leaving space for pointers as well.
Placing IO into page zero (usually found in microcontrollers) can be a little beneficial, you get small saves in cycles and more saves in code space. These advantages would vanish if you have to change D (or DP on the 816) each time you need to access IO (and wishes to use direct addressing). This is something
I wouldn't bother with.
Re: 65VM02
Posted: Fri May 26, 2017 3:51 pm
by Dr Jefyll
Using valuable direct page addresses on I/O hardware means fewer addresses are available for data structures that can really benefit from the quicker direct page addressing modes, such as buffer pointers, flags and math accumulators.
It's true I/O in zero/direct page may potentially present a conflict, but a savvy designer keeps an open mind and never says never.
If the application is compute-bound and if we are forced to be miserly when allocating zero-page/direct-page locations then I agree, BDD, that I/O doesn't belong there. However, some applications are I/O bound... in which case it's no trifling advantage if every 4-cycle I/O access can be reduced to 3 cycles.

Moreover, we're
not always forced to be miserly when allocating zero-page/direct-page locations. The crowding there is avoidable unless a pre-existing bios or o/s has already squandered the space. (It's not speculation -- I have successfully done this.)
These advantages would vanish if you have to change D (or DP on the 816) each time you need to access IO (and wishes to use direct addressing). This is something I wouldn't bother with.
Yes and no -- it's still necessary to consider the intended application. Certainly having to change D or DP
is another factor to weigh, and in a general-purpose computer I wouldn't bother with it, either. OTOH I can envision a real-time application where D/DP can remain unchanged within the time-critical inner loop. So again we're looking at a 33% speedup in the I/O operations themselves, and that could be pivotal.
-- Jeff
Re: 65VM02
Posted: Fri May 26, 2017 4:05 pm
by BigEd
Yes, I can imagine
bit-banging interfaces which could need or gain from lower latency I/O accesses - perhaps even
disk service routines in the style of Acorn could benefit, for relatively high density floppies serviced by relatively low speed 6502 systems.
Re: 65VM02
Posted: Sat May 27, 2017 1:16 am
by Hugh Aguilar
Placing IO into page zero (usually found in microcontrollers) can be a little beneficial, you get small saves in cycles and more saves in code space. These advantages would vanish if you have to change D (or DP on the 816) each time you need to access IO (and wishes to use direct addressing). This is something I wouldn't bother with.
Well, the MIRQ interrupts automatically get you to D=0 so this is pretty fast. For the NMI and IRQ interrupts, you have the ENTR and EXIT instructions that do this, so it is fast.
When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.
Re: 65VM02
Posted: Sat May 27, 2017 2:12 am
by Hugh Aguilar
Using JVM with A having an odd contents could become a GTF (aka GoToForest). But JVM could do an implicit A and $FE to avoid this.
By combining Page*$100 + A and then shift this one bit left you still work with 8 bit quantities.
Well, I have an upgrade on the 65VM02 document (attached). The JVM is back to what I had previously, that allowed for a 256 element jump-table. I have various other instructions added to boost the speed. I upgraded FLDA and FSTA to access 16MB now, so we can store large files in far memory.
Here is a snippet of the document:
Code: Select all
These are the new instructions (none affect the flags unless explicitly described as doing so):
JVM page jump through the pointer located at (page*$100 OR 2*A) --- the page value has to be even
OPA load A with (IP) in the first far-bank, then increment IP
OPY load Y with (IP) in the first far-bank, then increment IP
EXIP exchange IP with YA
YIP add the signed value in Y to IP
FLDA (direct),Y load A through a 3-byte pointer with a value in far-memory, setting the N Z flags
FSTA (direct),Y store A through a 3-byte pointer to far-memory
LLY load Y with the offset to the bottom value of the return-stack from the page boundary
AAS add A to S
EXAD exchange A and D
EXA (direct),Y exchange A with value at (direct),Y and set the N Z flags according to the new value in A
EXA direct,X exchange A with value at direct,X and set the N Z flags according to the new value in A
MUL multiply A by Y unsigned, leaving the product in YA
SGN sign-extend A into YA (set A to -1 or 0), setting the N and Z flags for the 16-bit result
TST test YA, setting the N and Z flags (appropriate for the whole 16-bit value)
ADY #value add the value to Y, setting the N Z V and C flags (in the same way as ADC does)
SBY #value subtract the value from Y, setting the N Z V and C flags (in the same way as SBC does)
CMPH direct,X like CMP, but uses the old C-flag (doesn't assume it is 1), and AND's the old Z-flag with the new Z-flag
BLT offset branch if less than branch if N <> V
BGE offset branch if greater than or equal branch if N = V
MRTI used to terminate MIRQ ISRs (similar to how RTI is used to terminate IRQ and NMI ISRs)
SEM sets the M-flag (this masks MIRQ interrupts, similar to SEI for IRQ)
CLM clears the M-flag (this allows MIRQ interrupts to occur, similar to CLI for IRQ)
ENTR push A X Y to the return-stack, then move D to X, then set A Y and D to zero
EXIT move X to D, then pull Y X A from the return-stack
Some of these instructions aren't strictly necessary. For example, OPY can be done with OPA TAY which is only slightly slower.
If chip resource usage is a problem, some of these instructions can be discarded and the code won't be much slower.
If chip resource usage is not a problem, some more instructions can be added (the INCH DECH LDYA STYA macros can be instructions).
It is possible to have two versions of the 65VM02. The big version is fully 65c02 compatible for legacy program support.
The small version would discard some of the crufty instructions in the 65c02 that are unneeded in Forth:
1.) The (direct,X) instructions can be discarded. It is unlikely that any legacy programs use this, so nobody will care.
2.) The JMP (address,X) instruction can be discarded. The JVM is more useful.
3.) The address,X instructions can be discarded (the direct,X is needed though).
The (direct,X) instructions were pretty useless --- nobody will care if (direct,X) is discarded.
The JMP (address,X) instruction was provided for byte-code VM systems, but these should be redesigned to use OPA and JVM instead.
Both of these addressing modes can be discarded without little or no pain.
The address,X instructions are pretty commonly used, so discarding them will break a lot of legacy programs.
Also, a C or Pascal compiler written for the 65VM02 may need them, so it is best to keep them even though Forth doesn't need them.
The "look and feel" of the 65c02 will be retained. There is no radical departure done, such as making the registers 16-bit.
It should be easy to port legacy 65c02 programs to the 65VM02 --- the 65c02 programmer should feel at home.
Re: 65VM02
Posted: Sat May 27, 2017 3:45 am
by BigDumbDinosaur
When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.
Not on the 65C816 you don't.
Re: 65VM02
Posted: Sat May 27, 2017 1:55 pm
by Hugh Aguilar
When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.
Not on the 65C816 you don't.
I'm not very familiar with the 65c816. Does it not have any instructions for accessing 1-bit data?
I supposed (when the W65c02 came out) that the BBR BBS RMB SMB instructions were put in the W65c02 partially for accessing I/O ports, and partially for supporting PLCs that use a lot of 1-bit variables and typically have very little memory (IIRC Western had a version of the W65c02 with 512 bytes) so you don't want to dedicate an entire byte to each 1-bit variable.
I've never been much interested in the 65c816. It seems to have been designed for use in desktop computers (it was used in the Apple-IIGS), but when it came out the era of the 8-bit desktop-computer was rapidly fading away (Apple had their MC68000 Mac at the time, and the Apple-IIGS was seen as a poor-man's Mac, so it didn't have much of a future). A variant of the 65c816 was used in the Super Nintendo and this was Western's primary business for a long time. I actually bought a Super Nintendo in the hopes of writing games for it, as I expected that the 65c816 would be easy for me to program given my 65c02 experience, but then I found out that Nintendo didn't allow third-party games at all. I kept the machine for a while, and I played the Mario game that came with it, but after a while I got bored with the game so I gave the machine to a girl. I'm not much interested in video games --- I played Ms. PacMan when I was younger --- there are really better ways to waste time though...
BTW: I read somewhere that Ms. PacMan used a 6502 internally. Most of those video games used the Z80 though, and I think some of the more advanced ones used the 6809.
I remember talking to one Color Computer enthusiast and telling him about the 65c816, that is like the 65c02 except with the registers upgraded to 16-bits. He said: "The 6809 has already been invented." Ouch!

Re: 65VM02
Posted: Sat May 27, 2017 2:02 pm
by Hugh Aguilar
Here is a snippet of the document:
Code: Select all
BLT offset branch if less than branch if N <> V
BGE offset branch if greater than or equal branch if N = V
That was a mistake. I have an upgraded version with this instead:
Code: Select all
BLT offset branch if less than branch if N <> V
BGT offset branch if greater than branch if N = V and ~Z
This LTE_BRANCH primitive is now one instruction shorter:
Code: Select all
LT_BRANCH: ; this is compiled by: >= IF
OPY ; the operand is the offset to branch
LDA soslo,X
CMP toslo,X
LDA soshi,X
CMPH toshi,X ; we could use SBC here because we only need N and V to be correct
BLT DO_BRANCH
DO_NOT_BRANCH:
INX
INX
NEXT
LTE_BRANCH: ; this is compiled by: > IF
OPY ; the operand is the offset to branch
LDA soslo,X
CMP toslo,X
LDA soshi,X
CMPH toshi,X ; we could not use SBC here because the Z-flag would only reflect the high-byte
BGT DO_NOT_BRANCH
DO_BRANCH:
YIP
INX
INX
NEXT
Re: 65VM02
Posted: Sat May 27, 2017 6:43 pm
by BigDumbDinosaur
When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.
Not on the 65C816 you don't.
I'm not very familiar with the 65c816. Does it not have any instructions for accessing 1-bit data?
The
$xF opcodes used by
BBR and
BBS were reassigned to the 24 bit absolute addressing modes, which in the 65C816 are generally more useful instructions. The
$x7 opcodes used by
RMB and
SMB were reassigned to the 24 bit indirect indexed long addressing modes, e.g.,
LDA [<dp>],Y, which are also very useful. The
TRB and
TSB "combo" instructions essentially accomplish what
RMB and
SMB do, as well as the functionality of
BBR and
BBS, without hogging a lot of space in the opcode table. Also,
TRB and
TSB work on absolute addresses, as well as direct page, making them more general in nature.
Back when I used to do a lot of 65C02 development I never found any use for
BBR,
BBS,
RMB and
SMB. I sure got plenty of use from
TRB and
TSB, though.
Re: 65VM02
Posted: Sun May 28, 2017 2:26 am
by Hugh Aguilar
Back when I used to do a lot of 65C02 development I never found any use for BBR, BBS, RMB and SMB. I sure got plenty of use from TRB and TSB, though.
Right now my mind is focused on this 65VM02 idea. I always liked the 65c02 --- I also thought that some of the design decisions were very dubious --- I wanted an upgraded version, but I didn't want to go to 16-bit registers which seems rather radical.
I should learn how to program in 65c816 assembly-language though. I don't really know anything about that processor. Is there an experimenter board available for it?