6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 10, 2024 10:06 am

All times are UTC




Post new topic Reply to topic  [ 217 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13 ... 15  Next
Author Message
 Post subject: Re: 65VM02
PostPosted: Thu May 25, 2017 8:29 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Using JVM with A having an odd contents could become a GTF (aka GoToForest). But JVM could do an implicit A and $FE to avoid this.
By combining Page*$100 + A and then shift this one bit left you still work with 8 bit quantities.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Fri May 26, 2017 12:05 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
My friend used to say that I must have executed an "ITW" instruction, for "Into The Weeds".

Mike B.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Fri May 26, 2017 1:28 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
GTF, ITW, and RPC (Randomize PC) are common hidden features of all sort of programming gear me thinks.


edit(1): there is a relative variant too: BH (Branch and Hang)


Last edited by GaBuZoMeu on Fri May 26, 2017 9:45 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Fri May 26, 2017 4:41 am 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
GaBuZoMeu wrote:
The D-register could span from A9..A16 instead of A8..A15. Increment D then means D := D+512. This would extend the direct pages even into the alternate bank, doubles the number of possible separated tasks - I don't speak Forth, so no idea whether this could be of any use.

The D register tells us where the direct-page and return-stack are.

Each task has its own direct-page and return-stack and its own value for D. I don't envision any application having more than maybe four tasks. So, D is plenty big already.

Thanks for the idea though! I don't think it really works for the 65VM02, but still, I want to hear everybody's ideas. :)

The 65c816 has a DP register that is similar to my D registers except that it is 16-bit. I have heard (on this forum) about Forth systems that use DP as a data-stack. This is another interesting idea. I'm somewhat dubious of this because normally you want to have pointers and I/O ports in zero-page, and with this scheme you would have to set DP to 0 to access them, which would be awkward.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Fri May 26, 2017 5:10 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8479
Location: Midwestern USA
Hugh Aguilar wrote:
...normally you want to have pointers and I/O ports in zero-page, and with this scheme you would have to set DP to 0 to access them, which would be awkward.

There's little incentive to make I/O hardware appear in zero page, especially if running with a high clock rate. On the face of it, it would seem that I/O access would benefit from direct page addressing. However, in most device drivers, reads and writes on the hardware are relatively few in number compared to other operations that are going on, such as manipulating the data structures that are handling data from I/O devices. A direct page load or store, on average, takes one less clock cycle than an absolute load or store. Hence the real performance gain in the context of direct page hardware access tends to be quite small, and can all but vanish if wait-states are required.

Using valuable direct page addresses on I/O hardware means fewer addresses are available for data structures that can really benefit from the quicker direct page addressing modes, such as buffer pointers, flags and math accumulators. I've worked with 65xx hardware for over 40 years and cannot recall encountering anything in which the I/O hardware was mapped in at direct page.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Fri May 26, 2017 10:57 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Hugh Aguilar wrote:
The D register tells us where the direct-page and return-stack are.

Each task has its own direct-page and return-stack and its own value for D. I don't envision any application having more than maybe four tasks. So, D is plenty big already.

D is still one byte. It's like the JVM where you need A to be even (LSB = 0). D is the high byte only base address for direct and return stack addresses. By shifting D one bit left during usage your CPU won't loose any single clock.

4 tasks:

It is not long ago that a customer asked us to built a sort of multi channel interface. So we built around a small ARM device this IF. It receives enveloped data from the host via USB and dispatches it according to the envelopes to 4 serial ports (2x RS232, 1x RS422, 1x RS485), one CAN bus, 3x I²C, and 3x SPI. Each of these ports will eventually respond to the transmission. These responses has to be enveloped and enqueued into an upstream back to the host.

We decided to use a small preemptive multitasking system (cooperative). There is a separate interrupt service routine for each port and a corresponding task with own IO buffer and mailbox. All sort of various protocols, restrictions, and timing demands are handled by each task - this is very pleasant to write and easier to verify. A separate monitor task (high priority with another serial IO for its own) was used to sniff here and there, to inject additional packages, some of them willingly wrong so we could check error management during load...

This took 12 tasks (11 IO plus USB) for the job plus two additional (monitor and some aux) for maintenance. And around 25 KB for buffers ;)

Hugh Aguilar wrote:
The 65c816 has a DP register that is similar to my D registers except that it is 16-bit. I have heard (on this forum) about Forth systems that use DP as a data-stack. This is another interesting idea. I'm somewhat dubious of this because normally you want to have pointers and I/O ports in zero-page, and with this scheme you would have to set DP to 0 to access them, which would be awkward.

As far as I understand the way Forth works, neither return stack nor data stack requires to be huge. So they should fit into 512 bytes leaving space for pointers as well.
Placing IO into page zero (usually found in microcontrollers) can be a little beneficial, you get small saves in cycles and more saves in code space. These advantages would vanish if you have to change D (or DP on the 816) each time you need to access IO (and wishes to use direct addressing). This is something I wouldn't bother with.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Fri May 26, 2017 3:51 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigDumbDinosaur wrote:
Using valuable direct page addresses on I/O hardware means fewer addresses are available for data structures that can really benefit from the quicker direct page addressing modes, such as buffer pointers, flags and math accumulators.
It's true I/O in zero/direct page may potentially present a conflict, but a savvy designer keeps an open mind and never says never.

If the application is compute-bound and if we are forced to be miserly when allocating zero-page/direct-page locations then I agree, BDD, that I/O doesn't belong there. However, some applications are I/O bound... in which case it's no trifling advantage if every 4-cycle I/O access can be reduced to 3 cycles. :!: Moreover, we're not always forced to be miserly when allocating zero-page/direct-page locations. The crowding there is avoidable unless a pre-existing bios or o/s has already squandered the space. (It's not speculation -- I have successfully done this.)

GaBuZoMeu wrote:
These advantages would vanish if you have to change D (or DP on the 816) each time you need to access IO (and wishes to use direct addressing). This is something I wouldn't bother with.
Yes and no -- it's still necessary to consider the intended application. Certainly having to change D or DP is another factor to weigh, and in a general-purpose computer I wouldn't bother with it, either. OTOH I can envision a real-time application where D/DP can remain unchanged within the time-critical inner loop. So again we're looking at a 33% speedup in the I/O operations themselves, and that could be pivotal.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Fri May 26, 2017 4:05 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10976
Location: England
Yes, I can imagine bit-banging interfaces which could need or gain from lower latency I/O accesses - perhaps even disk service routines in the style of Acorn could benefit, for relatively high density floppies serviced by relatively low speed 6502 systems.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Sat May 27, 2017 1:16 am 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
GaBuZoMeu wrote:
Placing IO into page zero (usually found in microcontrollers) can be a little beneficial, you get small saves in cycles and more saves in code space. These advantages would vanish if you have to change D (or DP on the 816) each time you need to access IO (and wishes to use direct addressing). This is something I wouldn't bother with.

Well, the MIRQ interrupts automatically get you to D=0 so this is pretty fast. For the NMI and IRQ interrupts, you have the ENTR and EXIT instructions that do this, so it is fast.

When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Sat May 27, 2017 2:12 am 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
GaBuZoMeu wrote:
Using JVM with A having an odd contents could become a GTF (aka GoToForest). But JVM could do an implicit A and $FE to avoid this.
By combining Page*$100 + A and then shift this one bit left you still work with 8 bit quantities.

Well, I have an upgrade on the 65VM02 document (attached). The JVM is back to what I had previously, that allowed for a 256 element jump-table. I have various other instructions added to boost the speed. I upgraded FLDA and FSTA to access 16MB now, so we can store large files in far memory.

Here is a snippet of the document:
Code:
These are the new instructions (none affect the flags unless explicitly described as doing so):
JVM page            jump through the pointer located at (page*$100 OR 2*A) --- the page value has to be even
OPA                 load A with (IP) in the first far-bank, then increment IP
OPY                 load Y with (IP) in the first far-bank, then increment IP
EXIP                exchange IP with YA
YIP                 add the signed value in Y to IP
FLDA (direct),Y     load A through a 3-byte pointer with a value in far-memory, setting the N Z flags
FSTA (direct),Y     store A through a 3-byte pointer to far-memory
LLY                 load Y with the offset to the bottom value of the return-stack from the page boundary
AAS                 add A to S
EXAD                exchange A and D
EXA (direct),Y      exchange A with value at (direct),Y and set the N Z flags according to the new value in A
EXA direct,X        exchange A with value at direct,X and set the N Z flags according to the new value in A
MUL                 multiply A by Y unsigned, leaving the product in YA
SGN                 sign-extend A into YA (set A to -1 or 0), setting the N and Z flags for the 16-bit result
TST                 test YA, setting the N and Z flags (appropriate for the whole 16-bit value)
ADY #value          add the value to Y, setting the N Z V and C flags (in the same way as ADC does)
SBY #value          subtract the value from Y, setting the N Z V and C flags (in the same way as SBC does)
CMPH direct,X       like CMP, but uses the old C-flag (doesn't assume it is 1), and AND's the old Z-flag with the new Z-flag
BLT offset          branch if less than                 branch if  N <> V
BGE offset          branch if greater than or equal     branch if  N = V
MRTI                used to terminate MIRQ ISRs (similar to how RTI is used to terminate IRQ and NMI ISRs)
SEM                 sets the M-flag (this masks MIRQ interrupts, similar to SEI for IRQ)
CLM                 clears the M-flag (this allows MIRQ interrupts to occur, similar to CLI for IRQ)
ENTR                push A X Y to the return-stack, then move D to X, then set A Y and D to zero
EXIT                move X to D, then pull Y X A from the return-stack

Some of these instructions aren't strictly necessary. For example, OPY can be done with OPA TAY which is only slightly slower.
If chip resource usage is a problem, some of these instructions can be discarded and the code won't be much slower.
If chip resource usage is not a problem, some more instructions can be added (the INCH DECH LDYA STYA macros can be instructions).

It is possible to have two versions of the 65VM02. The big version is fully 65c02 compatible for legacy program support.
The small version would discard some of the crufty instructions in the 65c02 that are unneeded in Forth:
1.) The (direct,X) instructions can be discarded. It is unlikely that any legacy programs use this, so nobody will care.
2.) The JMP (address,X) instruction can be discarded. The JVM is more useful.
3.) The address,X instructions can be discarded (the direct,X is needed though).

The (direct,X) instructions were pretty useless --- nobody will care if (direct,X) is discarded.
The JMP (address,X) instruction was provided for byte-code VM systems, but these should be redesigned to use OPA and JVM instead.
Both of these addressing modes can be discarded without little or no pain.

The address,X instructions are pretty commonly used, so discarding them will break a lot of legacy programs.
Also, a C or Pascal compiler written for the 65VM02 may need them, so it is best to keep them even though Forth doesn't need them.

The "look and feel" of the 65c02 will be retained. There is no radical departure done, such as making the registers 16-bit.
It should be easy to port legacy 65c02 programs to the 65VM02 --- the 65c02 programmer should feel at home.


Attachments:
File comment: JVM upgraded to 256 primitives, various instructions added to boost the speed, and 16MB for FLDA and FSTA.
65VM02.txt [40.51 KiB]
Downloaded 61 times
Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Sat May 27, 2017 3:45 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8479
Location: Midwestern USA
Hugh Aguilar wrote:
When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.

Not on the 65C816 you don't.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Sat May 27, 2017 1:55 pm 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
BigDumbDinosaur wrote:
Hugh Aguilar wrote:
When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.

Not on the 65C816 you don't.

I'm not very familiar with the 65c816. Does it not have any instructions for accessing 1-bit data?

I supposed (when the W65c02 came out) that the BBR BBS RMB SMB instructions were put in the W65c02 partially for accessing I/O ports, and partially for supporting PLCs that use a lot of 1-bit variables and typically have very little memory (IIRC Western had a version of the W65c02 with 512 bytes) so you don't want to dedicate an entire byte to each 1-bit variable.

I've never been much interested in the 65c816. It seems to have been designed for use in desktop computers (it was used in the Apple-IIGS), but when it came out the era of the 8-bit desktop-computer was rapidly fading away (Apple had their MC68000 Mac at the time, and the Apple-IIGS was seen as a poor-man's Mac, so it didn't have much of a future). A variant of the 65c816 was used in the Super Nintendo and this was Western's primary business for a long time. I actually bought a Super Nintendo in the hopes of writing games for it, as I expected that the 65c816 would be easy for me to program given my 65c02 experience, but then I found out that Nintendo didn't allow third-party games at all. I kept the machine for a while, and I played the Mario game that came with it, but after a while I got bored with the game so I gave the machine to a girl. I'm not much interested in video games --- I played Ms. PacMan when I was younger --- there are really better ways to waste time though...

BTW: I read somewhere that Ms. PacMan used a 6502 internally. Most of those video games used the Z80 though, and I think some of the more advanced ones used the 6809.

I remember talking to one Color Computer enthusiast and telling him about the 65c816, that is like the 65c02 except with the registers upgraded to 16-bits. He said: "The 6809 has already been invented." Ouch! :x


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Sat May 27, 2017 2:02 pm 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
Hugh Aguilar wrote:
Here is a snippet of the document:
Code:
BLT offset          branch if less than                 branch if  N <> V
BGE offset          branch if greater than or equal     branch if  N = V

That was a mistake. I have an upgraded version with this instead:
Code:
BLT offset          branch if less than                 branch if  N <> V
BGT offset          branch if greater than              branch if  N = V and ~Z

This LTE_BRANCH primitive is now one instruction shorter:
Code:
LT_BRANCH:          ; this is compiled by:  >= IF
    OPY             ; the operand is the offset to branch
    LDA soslo,X
    CMP toslo,X
    LDA soshi,X
    CMPH toshi,X            ; we could use SBC here because we only need N and V to be correct
    BLT DO_BRANCH
DO_NOT_BRANCH:
    INX
    INX
    NEXT

LTE_BRANCH:         ; this is compiled by:  > IF
    OPY             ; the operand is the offset to branch
    LDA soslo,X
    CMP toslo,X
    LDA soshi,X
    CMPH toshi,X            ; we could not use SBC here because the Z-flag would only reflect the high-byte
    BGT DO_NOT_BRANCH
DO_BRANCH:
    YIP
    INX
    INX
    NEXT


Attachments:
File comment: I got rid of BGE and replaced it with BGT.
65VM02.txt [40.36 KiB]
Downloaded 101 times
Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Sat May 27, 2017 6:43 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8479
Location: Midwestern USA
Hugh Aguilar wrote:
BigDumbDinosaur wrote:
Hugh Aguilar wrote:
When you have your I/O in the direct-page, you get to use BBR BBS RMB SMB instructions --- these are faster than using the A register and logic instructions.

Not on the 65C816 you don't.

I'm not very familiar with the 65c816. Does it not have any instructions for accessing 1-bit data?

The $xF opcodes used by BBR and BBS were reassigned to the 24 bit absolute addressing modes, which in the 65C816 are generally more useful instructions. The $x7 opcodes used by RMB and SMB were reassigned to the 24 bit indirect indexed long addressing modes, e.g., LDA [<dp>],Y, which are also very useful. The TRB and TSB "combo" instructions essentially accomplish what RMB and SMB do, as well as the functionality of BBR and BBS, without hogging a lot of space in the opcode table. Also, TRB and TSB work on absolute addresses, as well as direct page, making them more general in nature.

Back when I used to do a lot of 65C02 development I never found any use for BBR, BBS, RMB and SMB. I sure got plenty of use from TRB and TSB, though.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: 65VM02
PostPosted: Sun May 28, 2017 2:26 am 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
BigDumbDinosaur wrote:
Back when I used to do a lot of 65C02 development I never found any use for BBR, BBS, RMB and SMB. I sure got plenty of use from TRB and TSB, though.

Right now my mind is focused on this 65VM02 idea. I always liked the 65c02 --- I also thought that some of the design decisions were very dubious --- I wanted an upgraded version, but I didn't want to go to 16-bit registers which seems rather radical.

I should learn how to program in 65c816 assembly-language though. I don't really know anything about that processor. Is there an experimenter board available for it?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 217 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13 ... 15  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: