6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon May 20, 2024 8:12 pm

All times are UTC




Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Sun Jun 05, 2016 2:44 am 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
Dr Jefyll wrote:
Hugh Aguilar wrote:
Are any of you familiar with ISYS Forth for the Apple-IIc? About 25 years ago I was programming in it. It is an STC Forth.
I'm with Garth; IOW, intrigued with STC, but haven't gotten around to playing with it yet. No doubt there's some serious speed potential there! 8) Is ISYS your own creation, Hugh? You mentioned the cross-compiler is.

No. ISYS Forth was a commercial product sold in a shrink-wrapped package. I don't recall the name of the guy who wrote it. This was in the old days when software was sold like this --- now software is given away for free on the internet.

I just wrote the cross-compiler based on ISYS Forth and I lifted a lot of code out of ISYS Forth (all the floating-point and arithmetic stuff, for example). My cross-compiler didn't do any optimization that ISYS wasn't already doing. The advantages of the cross-compiler was that I could put a much larger program on the Apple-IIc, and I could do single-step debugging --- the disadvantage was that I no longer had interactive development with an outer-interpreter in the traditional way --- these were the same advantages and disadvantages of MFX that I later wrote.

Dr Jefyll wrote:
I'm lukewarm on the split stack thing (if the cells are only 16-bit, I mean). I think somewhere around here we have a thread on split stacks. Anyway, I'm happy to agree that the split stack idea does reduce the overhead for INX's and DEX's of the p-stack pointer.

As for the thing about the peephole optimizer doing liposuction, :mrgreen: it's clever -- I like it. But it seems noteworthy that the INX-DEX problem is getting attacked from two different angles. (The peephole and the split stack both reduce INX-DEX overhead.) I'm sold on the peephole deal, but I expect there'd be diminishing returns if the split stack were added as well. Is the split stack worth the cost paid by fetch and store? Keeping those guys fast is pretty important, too.

The combined stack only helps with 8-bit data-access because you can use (zp,x) for that. The (zp,x) addressing-mode doesn't work well with 16-bit data-access because you have to increment the 16-bit address at zp,x to point it to the high byte. By comparison, if you move the pointer to a zp-pair then you can use (zp),y and you only have to increment Y which is much faster than incrementing a zp-pair.

All in all, the 65c02 desperately needed (zp,x),y --- in this case the combined stack could be used and you wouldn't have to increment a memory pair but could just increment Y, and you wouldn't have to move the pointer somewhere else to use it.

I think every 65c02 programmer wanted (zp,x),y in the next processor --- the fact that the 65c816 lacked this much-desired feature indicated to me that Bill Mensch was out of touch --- I really turned my back on the 65' world because of that (I bought a Radio Shack Color Computer and went with 6809).

Dr Jefyll wrote:
Quote:
What the 65c02 desperately needed was a (zp,x),y addressing-mode
You mean indexed before and after the indirection? Absolutely! And at least one 65c02 was extended to support this! Great minds think alike! (or something like that... :) )

I never heard of that.

Anyway, I don't really have any interest in the 65c816.

As I said, I do think the 65c02 has some use still --- mostly in a multi-core system similar to the Propeller --- the 65c02 can't compete directly against the PIC24, but a multi-core 65c02 system could be both more powerful and easier to program.

The 65c02 does need a (zp,x),y addressing mode. I would even get rid of (zp,x) in favor of (zp,x),y if it can only be one or the other.

The 65c02 does need instructions to manipulate bits. I mentioned the ones that were limited to zp and several of you guys said that it is mistake to limit I/O to zp. I don't think that is a big problem. Also, these instructions aren't just for I/O. I think the instructions were primarily introduced to support PLCs with their ladder-diagrams in which there are a lot of 1-bit variables used. The 8051 had instructions for accessing 1-bit variables in a small section of zp (32 1-bit variables total) --- this is why the 8051 was used in almost all PLCs for a while.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 05, 2016 3:53 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8440
Location: Southern California
Hugh Aguilar wrote:
The 65c02 does need a (zp,x),y addressing mode. I would even get rid of (zp,x) in favor of (zp,x),y if it can only be one or the other.

So @ could be simplified from this:

Code:
       LDA  (0,X)
       PHA
       INC  0,X
       BNE  fet1
       INC  1,X
fet1:  LDA  (0,X)
       JMP  PUT
; and elsewhere, PUT which is used in so many places is:
PUT:   STA  1,X
       PLA
       STA  0,X


to this (assuming Y started out known to contain 0):

Code:
       LDA  (0,X)
       PHA
       INY             ; (assuming Y started out known to be 0)
       LDA  (0,X),Y
       JMP  PUT
; and elsewhere, PUT which is used in so many places is:
PUT:   STA  1,X
       PLA
       STA  0,X

which is still four times as many instructions as the '816 requires:

Code:
       LDA  (0,X)
       STA  0,X         ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.


Splitting the stack on the '02 makes it even longer:

Code:
        LDA  Stack_Low,X
        STA  temp
        LDA  Stack_High,X
        STA  temp+1
        LDA  (temp)
        PHA
        INY             ; (assuming Y started with 0)
        LDA  (temp),Y
        JMP  PUT
; and elsewhere, PUT which is used in so many places is:
PUT:    STA  1,X
        PLA
        STA  0,X

(Someone please point it out if I have any mistakes.)

Keeping TOS in temp would shorten it for @, but then other common ones like DROP get longer. Any way you look at it, the '816 comes out way ahead.

Quote:
The 65c02 does need instructions to manipulate bits. I mentioned the ones that were limited to zp and several of you guys said that it is mistake to limit I/O to zp. I don't think that is a big problem. Also, these instructions aren't just for I/O. I think the instructions were primarily introduced to support PLCs with their ladder-diagrams in which there are a lot of 1-bit variables used. The 8051 had instructions for accessing 1-bit variables in a small section of zp (32 1-bit variables total) --- this is why the 8051 was used in almost all PLCs for a while.

Memory in the early 1980's was very expensive compared to now. I remember when an 8Kx8 SRAM IC, probably 250ns, was $40 at Jameco. When the Japanese started dumping in about 1985, that same SRAM came down to about $8 and I splurged and bought two. (That was when $8 was worth as much as 20 of today's dollars.) Putting 32 one-bit variables in four bytes was being pretty thrifty, and probably quite appropriate, especially for a microcontroller than may have only had 64 or 128 bytes of RAM. The 2-byte instructions to write to or examine them and branch would save a little program memory too. I normally do flags in Forth as one- or two-byte variables and waste the rest of the bits, and they're also not usually in ZP. There aren't enough of them to matter. In assembly language, clearing the flag is done with STZ. Setting the flag is done with DEC, as long as there's no danger of DEC'ing it so many times that you accidentally clear bit 7. Examining is with BIT (regardless of the registers' contents), usually followed by BPL or BMI. These can be done in ZP or non-ZP, either one.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 05, 2016 4:16 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
Hugh Aguilar wrote:
several of you guys said that it is mistake to limit I/O to zp. I don't think that is a big problem.
Regarding this point, I'm with you. I built all three of my 65xx systems to use I/O in zp, and a few years back I started a thread to raise awareness of how I/O in zp is widely overlooked, and may sometimes be the best choice (particularly for the 'c02).

I'm still not sold on the split stack, though. Garth showed how it doesn't benefit the '816, and IMO it doesn't benefit the 'c02 either. I do agree when you say, "the (zp,x) addressing-mode doesn't work well with 16-bit data-access because you have to increment the 16-bit address at zp,x to point it to the high byte." Incrementing the 16-bit address at zp,x typically takes about 9 cycles. I need to reread what you said about copying from a split stack to a z-pg pair, but it sounds like a 14-cycle operation (not 9). Other factors may mitigate that slightly, but, even so, I believe the split stack is a net loss compared to the (admittedly disappointing) increment of the 16-bit address at zp,x.

-- Jeff

[edit: PS to Garth:]
GARTHWILSON wrote:
Splitting the stack on the '02 makes it even longer:

Code:
        LDA  Stack_Low,X
        STA  temp
        LDA  Stack_High,X
        STA  temp+1
        LDA  (temp)
        PHA ; <--------------
        INY             ; (assuming Y started with 0)
        LDA  (temp),Y
        JMP  PUT
; and elsewhere, PUT which is used in so many places is:
PUT:    STA  1,X
        PLA ; <--------------
        STA  0,X

(Someone please point it out if I have any mistakes.)

Alright. :) The PHA and the PLA aren't required -- they apply only to the non-split-stack version of fetch. With a split stack you can omit them, as follows:

Code:
        LDA  Stack_Low,X
        STA  temp
        LDA  Stack_High,X
        STA  temp+1
        LDA  (temp)
        STA Stack_Low,X ;<--------
        INY             ; (assuming Y started with 0)
        LDA  (temp),Y
        STA Stack_High,X
        JMP  NEXT

But it doesn't change the outcome. The non-split approach is still faster, although it's not as big a difference as you supposed.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 05, 2016 6:42 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8440
Location: Southern California
Ah yes, thanks. It still shows how much more efficient the '816 is. There are a lot of Forth primitives in my '816 Forth that are only two or three machine-language instructions.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 05, 2016 11:29 pm 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
GARTHWILSON wrote:
Ah yes, thanks. It still shows how much more efficient the '816 is. There are a lot of Forth primitives in my '816 Forth that are only two or three machine-language instructions.

Well, of course 16-bit registers are going to improve the efficiency of 16-bit operations! This seems like a straw-man argument to imply that I believe a processor with 8-bit registers is going to be more efficient at Forth than a processor with 16-bit registers.

My point was that the 65c816 was oriented toward C rather than Forth --- the improvement was greater in C than in Forth because the (offset,s),y addressing-mode supported the C local-frame that isn't used much in Forth --- this is because Forth would typically use the X register as its parameter stack and do most or all of its work on the parameter-stack with minimal or no use of locals. The 65c816 is *relatively* worse at Forth (compared to C) than the 65c02 is.

I'm assuming that a multi-core system built into an FPGA would need the processors to use minimal resources (8-bit registers) which implies something like the 65c02 but with some extra features such as (zp,x),y addressing and bit-addressing --- I would rather have a dual-core system with two such processors than a single-core 65c816 system --- either way, I think the size (and expense) of the FPGA chip would be about the same.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2016 12:57 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1930
Location: Sacramento, CA, USA
IMO, Charlie is doing some very nice work with only NMOS instructions in his PETTIL project. Here's a sample from here:
Code:

;--------------------------------------------------------------
#if 0
name=@
stack=( addr -- 16b )
tags=forth-79,nucleus,memory,fig,forth-83
Leave the 16 bit contents of address.

!!! pronounced:"fetch"
  16b is the value at addr.
#endif
fetch
    sec
    .byt $29        ; AND #
;--------------------------------------------------------------
#if 0
name=C@
stack=( addr -- 8b )
tags=forth-79,nucleus,memory


!!! pronounced: "c-fetch"
 8b is the contents of the byte at addr.

#endif
cfetch
    clc
cfetch01
    ldy #0
    lda (tos),y
    bcc put
    pha
    iny
    lda (tos),y
    tay
    pla
    bcs put

;--------------------------------------------------------------

I believe he has spent quality time making the split-dictionary and split-stack with separate TOS work ... he's a very good NMOS coder, IMO. If you have carnal knowledge of your system, you can even do things to make it more efficient, like doing a NIP DROP instead of a DROP DROP (Charlie's NIP is INX).

My 65m32 DTC Forth has a single machine instruction NEXT, and a whole bunch of primitives ( enter branch + 0< 1+ 1- 2* 2/ >R @ AND DROP DUP EXIT INVERT NEGATE OR R> SWAP UNLOOP XOR [ ] NIP RDROP 2RDROP ... I'm sure that I'm forgetting a few others) that are one machine instruction plus NEXT. That kind of screams out for a change to STC with in-lining, but I am floundering in an endless sea of incomplete projects, so this one will have to get pushed back at least until my house is ready for winter. I estimate that I have a couple of hundred man-hours of work to do on the house, and I only have Sundays to do it (my six-day-a-week job exhausts me to the point that I can't do much at night, and I can only afford to hire amateur help). I also estimate that I have a couple of hundred hours of work to do on the 65m32 simulator, assembler, and minimal operating system (probably Forth) before I can even consider trying to implement it on an FPGA board. It most certainly doesn't matter how slick my system software is with no processor on which to run it. Ay, Chihuahua ...

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2016 1:17 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8440
Location: Southern California
Oh, Hugh, I just remembered something I think you'll like, to do practically the equivalent of (ZP,X),Y. The 816 makes it possible directly in software, but it's do-able on the '02 as well with a little extra address-decode logic. The following is the last 10% or so of chapter 5 of my 6502 stacks treatise. The first couple of paragraphs are most relevant:

      A side note about using ZP addressing in the hardware stack area in the 6502 and '816: Since the '816 lets you move the direct page (DP, like the 6502's ZP) around, and since the 256 bytes of DP do not need to start on a page boundary but can start anywhere in the 64K of bank 0, and since it can overlap the hardware stack area, you can use it to get DP addressing into hardware-stack frames. There is more on this here on the 6502.org wiki. Also, BDD has source code here on this [my] site that uses this technique for generating disc-resident bitmaps that define filesystem structure for his mkfs 816NIX filesystem generator program. Bit-twiddling of stack elements becomes more convenient too, since you can use TRB, TSB, etc..

      There is a way to get this on a 6502 too, (but with a big caveat!). If RAM ignores A8 in a teensy system with only 256 bytes of RAM, 100-1FF becomes a duplicate of 00-FF, making the hardware stack addressable in page 0 also, opening up ZP addressing modes for it. There could be more than 256 bytes of RAM, but using A8 in the rest of RAM but not in the first two pages will take some extra logic. If that is not applied, you can't have any contiguous section of code or data that crosses the boundary from an even-numbered page to an odd one.

      Going back to addressing modes that are both indirect and indexed: There's no JMP((addr),Y) (where addr points to a table, and you read the Y and Y+1 bytes into the table to find out where to jump to), but you can do the following, taking the address from the ZP data stack:

      Code:
                           ; With table address and the offset on the data stack,
              JSR  PLUS    ; add them together.  (This allows offsets of >255.)
              LDA  (1,X)   ; Read the resulting calculated addr and
              PHA          ; push it onto the stack, high byte first,
              LDA  (0,X)   ; then low byte.
              PHA
              INX          ; Drop the data-stack cell.
              INX
              RTS          ; Use RTS to make the actual jump.  Remember that the
                           ; address derived will need to be the destination-1.

      or, without even using the hardware stack (but having a caveat):

      Code:
                           ; With table address and the offset on the data stack,
              JSR  PLUS    ; add them together.  (This allows offsets of >255.)
              JMP  (0,X)   ; (Absolute, not ZP, so operand low byte will be 00.)

      The caveat is that the address on the ZP data stack will still be there. Wherever you jump to will need to start by removing it. (Another caveat is that the NMOS 6502 does not have JMP(addr,X) but that really need apply only to something like the Commodore 64 which used the 6510 which was never available in CMOS. For other 6502 computers, just plug a 65c02 in the socket! It has lots of advantages.)

      In the next section are examples of using the hardware stack for passing parameters to a subroutine.



Jeff Laughton (forum name Dr Jefyll) wrote the following in the topic "16-bit 6502 vs. ARM or MIPS?":

Quote:
sark02 wrote:
if you've chosen the '816 over ARM or MIPS, I'm curious as to why.
sark02 wrote:
It's the 16-bit version that I'm curious about, and I do wonder what the attraction is.
Hm, did the question shift? Hope I'm not OT if I make a remark contrasting the 6502 with the 65816.

For me there was a big light bulb that went on when I realized that a TSC instruction followed by a TCD puts your Direct Page on stack. Suddenly Direct Page ceases to be a crowded place! And you get some amazing new address-mode flexibility -- for example being able to use a three-byte indirect pointer that's on-stack.

My admiration for the '816 designers went up several notches. The 256 super-flexible processor registers that Garth mentioned can be a stack frame! Despite certain downsides it's an idea which IMO gives '816 programming a whole new level of sophistication as compared to 6502/65C02.

And, speaking of 3-byte pointers: the "16-bit version" generates 24-bit addresses, so that's a huge step up from what the '02 and 'C02 can do!

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2016 1:53 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8190
Location: Midwestern USA
Hugh Aguilar wrote:
GARTHWILSON wrote:
Ah yes, thanks. It still shows how much more efficient the '816 is...

Well, of course 16-bit registers are going to improve the efficiency of 16-bit operations! This seems like a straw-man argument...My point was that the 65c816 was oriented toward C rather than Forth...

That's the beauty of opinions: everybody has one. :D On Forth matters, I tend to respect Garth's, as he uses that environment for work purposes and thus has a good feel for its strengths and weaknesses. If his opinion is an '816 Forth implementation is superior to a 'C02 version I won't question it.

Although Forth has long had its followers and has found a home in various places, its significance in the programming universe is very minor compared to that of C. As another member here once crassly put it, Forth got voted off the programming island long ago. That statement may be a bit cruel (and misguided) but I do understand his point. Most of the world's professional software developers are not building applications in Forth, and most computer users don't even know that Forth exists, let alone what it does.

I don't subscribe to your theory that Bill Mensch had a target language in mind when he designed the 65C816.* However, let's suppose he had said to himself while perched at the drafting table, "I think I will rig up my new processor to facilitate the design of a C compiler." I would agree with his thinking, as the goal is to sell MPUs and make money. Ergo it would make good business sense to design the 65C816's ISA to work well with a widely used programming language rather than an obscure environment whose name many can't even properly spell.

Now, I am not a C acolyte—I tend to reach for an assembler when I want to write some code. From an assembly language standpoint, the '816 has some huge advantages over the 65C02, advantages that go well beyond the presence of 16 bit registers. It's not for nothing that I call the 65C816 a "65C02 on steroids."

In the code that I have developed for my POC unit, as well as a machine controller I designed around the 65C816, I make heavy use of the '816-specific addressing modes, as well as the 16 bit capabilities. When I converted POC's firmware from 65C02 emulation mode code to pure '816 native mode I realized a substantial code shrink and better performance. In integrating SCSI with the unit, the availability of 16 bit registers, stack pointer relative addressing and the ability to relocate direct page made it possible to fully support the SCSI protocol (except for disconnect/reconnect) in the ROM space that was freed up when I switched from emulation mode to native mode. I could not have done that with the 'C02 in the available ROM space, not would I have been able to achieve the 700KB/sec disk-to-core transfer rate I get with the '816.

A well-written Forth implementation on the '816 should be able to easily run circles around the same executing on a 'C02 running at the identical clock rate. Use of 16 bit loads and stores will be a large part of it. However, some '816-specific instructions could open new doors for performance gains and code shrink. In any implementation, you use the machine instructions that work best, and if some of the '816-specific ones (e.g., those that utilize stack pointer relative addressing) don't improve function over the traditional 'C02 instructions, you don't use them. For me, some of the 'C02 instructions, specifically the RMB/SMB and BBx ones, are of very limited usefulness, and back when I did professional development on a 'C02-based terminal server, I couldn't find any good use for any of them. How well they would mesh with a Forth implementation I cannot say, as I know little about Forth.

Quote:
The 65c816 is *relatively* worse at Forth (compared to C) than the 65c02 is.

Forth was conceived back when computers were quite primitive (I was already working on them around the time Chuck More invented Forth) and C didn't even exist. Forth and eight bit MPUs grew up together but had already plateaued by the time the 65C816 went into production. If the 65C816 "...is *relatively* worse..." at running Forth when compared to the 65C02 (Just how would one make such a comparison? Which metric defines "worse" in this case?) it's likely because no one has been sufficiently motivated to optimize a Forth for the '816 ISA and has instead reused 65C02 code as a matter of expediency.

———————————————————————————
*Indeed, had the 65C816 assembly language been specifically designed to favor C, some addressing modes would have been scrapped, such as (<dp>,X), and multiply and divide instructions would have been included. The fact is about 85 percent of the 65C816 assembly language is identical to that of the 65C02, and most of the '816-specific features are in new addressing modes, not new instructions.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2016 4:31 pm 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
GARTHWILSON wrote:
Since the '816 lets you move the direct page (DP, like the 6502's ZP) around, and since the 256 bytes of DP do not need to start on a page boundary but can start anywhere in the 64K of bank 0, and since it can overlap the hardware stack area, you can use it to get DP addressing into hardware-stack frames.

This is a pretty interesting technique. :-)

There are three possible techniques that could be used for functions that need to use pointers on the data-stack.
1.) Your technique:
PHD
TXA
TXD
... ; code that uses DP as the base of the data-stack
PLD
2.) Move the pointers to local variables and then use (offset,S),Y
3.) Move the pointers to zp-pair pseudo-registers and then use (zp),Y

For primitive functions that have pointers on the data-stack (such as CMOVE), #1 is the best choice (#2 could also be used but would be slightly slower). The advantage is that ISRs can be written in Forth and only the registers (A, Y and DP) need to be saved and restored (X and S can be used in the ISR so long as the ISR restores them to their original values, which it should naturally). #3 is the worst choice because the pseudo-registers need to be saved and restored as well, which will significantly slow down the ISRs --- #3 is faster than #2 though, so it might be an option in a system in which all ISRs are written in assembly-language (at least, the more speed-critical ISRs).

For user-written code that has pointers on the data-stack (typically these would be pointers to structs), #2 is the best choice. This leaves DP unchanged which may be needed if some of that user-written code accesses zp global variables.

So, I'm warming to the 65c816 somewhat. :-) Would you agree that pretty much every Forth compiler-writer makes the pointer registers 8-bit and only the accumulator 16-bit? If this is true, then an FPGA version of the 65c816 could be built that limits the user to only this configuration --- keeping the registers small helps a lot in allowing small inexpensive FPGA chips to be used --- of course, the 65c02 (upgraded with the (zp,X),Y addressing mode and some other minor features to help Forth) is still going to be less of a resource hog, which makes a multi-core system realistic.

Has anybody ever built an FPGA version of the 65c816? Normally FPGA systems are small enough that they use only internal memory and don't have any external memory, so an FPGA version of the 65c802 would be more likely. If you are going to have a lot of external memory, you are really better off to just use an ARM like everybody else in the world --- just out of curiosity though, has anybody ever written a TCP/IP stack for the 65c816? --- for the most part, TCP/IP is the one thing that a micro-controller needs a lot of memory for.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2016 8:08 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8190
Location: Midwestern USA
Hugh Aguilar wrote:
There are three possible techniques that could be used for functions that need to use pointers on the data-stack.
1.) Your technique:
PHD
TXA
TXD...

I think you meant TCD. TXD is not a 65C816 instruction.

Quote:
...just out of curiosity though, has anybody ever written a TCP/IP stack for the 65c816?

Dunno if one exists for the 65C816, but there is a 65C02 TCP/IP stack that is available.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2016 9:08 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8440
Location: Southern California
Hugh Aguilar wrote:
For primitive functions that have pointers on the data-stack (such as CMOVE), #1 is the best choice (#2 could also be used but would be slightly slower). The advantage is that ISRs can be written in Forth and only the registers (A, Y and DP) need to be saved and restored (X and S can be used in the ISR so long as the ISR restores them to their original values, which it should naturally). #3 is the worst choice because the pseudo-registers need to be saved and restored as well, which will significantly slow down the ISRs --- #3 is faster than #2 though, so it might be an option in a system in which all ISRs are written in assembly-language (at least, the more speed-critical ISRs).

I have interrupt service in both assembly language and high-level Forth. In ITC Forth, there's a way to do it that has no overhead. In fact, the part of NEXT that runs to launch you into an ISR is actually shorter than the normal part of NEXT, meaning that when the interrupt is received, the next Forth word to run, which is the ISR, starts sooner than the next word in line would have started had there been no interrupt. Since it's like inserting a Forth word whose stack effect is ( -- ), you don't have to set up new stacks save any registers. My article on it is at http://wilsonminesco.com/0-overhead_Forth_interrupts/ . I think it's the first article I ever wrote, about 1993.

Quote:
So, I'm warming to the 65c816 somewhat. :-)

:-) :-) :-) :-)

Quote:
Would you agree that pretty much every Forth compiler-writer makes the pointer registers 8-bit and only the accumulator 16-bit?

I don't know. I did it that way, but although it stays that way most of the time, there are a few places in the code where the register widths are changed. Ones that come to mind are C@ C! C_OFF CMOVE CMOVE> . (The 816's MVN and MVP memory-moving instructions use 16-bit X and Y for the source and destination addresses, and these get advanced as the memory-move process continues. These are also interruptible BTW, so a move of thousands of bytes doesn't hold up the interrupts and cause problems.) In my '02 Forth, I also have QCMOVE and QCMOVE>, the "Q" being for "quick," because these are shorter, faster versions that can be used when you're not moving any more than 255 bytes.

Quote:
Has anybody ever built an FPGA version of the 65c816?

I'm not aware of any (and I checked my links page in case there was one I had forgotten about).

Quote:
just out of curiosity though, has anybody ever written a TCP/IP stack for the 65c816?

Again I'm not aware of any. For 6502, we have the topic "Marina IP - a TCP/IP stack for Apple II" and there's another one whose link has gone dead.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 07, 2016 1:41 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
Hugh Aguilar wrote:
Has anybody ever built an FPGA version of the 65c816?
Forum member Rob Finch has a web site that has a whole page devoted to CPU cores. This includes the bc65816, and a 6809 upgrade that may interest you, the RTF6809.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 07, 2016 6:00 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8190
Location: Midwestern USA
GARTHWILSON wrote:
Quote:
just out of curiosity though, has anybody ever written a TCP/IP stack for the 65c816?

Again I'm not aware of any. For 6502, we have the topic "Marina IP - a TCP/IP stack for Apple II" and there's another one whose link has gone dead.

Here is the Marina IP project page.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 07, 2016 7:56 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Hugh Aguilar wrote:
Would you agree that pretty much every Forth compiler-writer makes the pointer registers 8-bit and only the accumulator 16-bit?

My 65C816 DTC Forth keeps all the registers in 16-bit mode and only changes A to 8-bits when accessing byte variables. It also uses DP as the data stack pointer.

The 65C816 supports many different register organisations for Forth, each with its own trade offs.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 07, 2016 9:03 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
Dr Jefyll wrote:
Hugh Aguilar wrote:
Has anybody ever built an FPGA version of the 65c816?
Forum member Rob Finch has a web site that has a whole page devoted to CPU cores. This includes the bc65816, and a 6809 upgrade that may interest you, the RTF6809.

-- Jeff

Thanks Jeff - I'd missed or forgotten that bc65816 core from Rob. As he says, it's untested. And we don't have anything like Klaus' suite for testing 816 cores. But this is an excellent start!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: