6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Apr 25, 2024 12:41 pm

All times are UTC




Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Dec 11, 2009 8:29 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Hi everyone. Yes a 65C02 computer with hardware that accelerates Forth, including a new addressing mode and a very fast, one-byte NEXT instruction:
https://laughtonelectronics.com/Arcana/ ... mmary.html

Not for sale, unfortunately - it's a homebuilt. But very obviously Forth related; hence the post under this topic.
I'm new to this forum and am pleased to become part of your group!

-- Jeff

Edit: add a photo of the board. Change link to point to the one-page Short Summary
Edit2: add another photo and the register diagram. BTW, to find specific info (also the photo gallery! :) ) please refer to the Contents and Appendices links here.
Attachment:
File comment: KK with its keyboard, monitor and separate floppy-disk system
PC043747CrpTch.jpg
PC043747CrpTch.jpg [ 224.28 KiB | Viewed 1518 times ]

Attachment:
File comment: the KK circuit board
KK 0791compShpSat FullRes.jpg
KK 0791compShpSat FullRes.jpg [ 527.41 KiB | Viewed 1518 times ]

Attachment:
File comment: KK gives the 65C02 six new registers and 44 new instructions. Besides the 9-cycle ITC Forth NEXT, another main feature is the pointer-arithmetic-friendly extended address space.
KK Register Diagram.png
KK Register Diagram.png [ 8.39 KiB | Viewed 1518 times ]


Last edited by Dr Jefyll on Mon Sep 05, 2022 10:01 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Dec 11, 2009 11:14 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8427
Location: Southern California
Ingenious! And welcome! The article is well written too.

How does the speed running Forth compare with a 65c02? My 65816 Forth runs 2-3 times as fast as my '02 Forth at a given clock speed. Part of that is because the 816 makes it practical to do so many more words as primitives instead of secondaries, so NEXT, nest, and unnest get run a lot fewer times to get a job done. I see you run the 4MHz Rockwell part at 5MHz. All WDC's 65c02's being sold today are guaranteed (conservatively) to run at 14MHz and they'll typically do 24MHz at room temperature. Have you looked at the timings of your added components to get an idea of how fast the system could run with a WDC 65c02?

On page 7, you compare the code for @ with a plain 6502 versus the KK version. The '816 version however is much shorter, with only two instructions:
Code:
        LDA   (0,X)
        STA   0,X


It makes me wonder how much further we could go with a KK version of the '816.

While the '02 needs around 40 clocks to do NEXT, the '816 cuts that down about 40%. A level of indirection can be eliminated by putting NEXT in ZP and making W an operand in the JMP() instruction in NEXT itself. At viewtopic.php?t=586 , Bruce Clark proposed a way to just use the 816's 16-bit stack pointer for the IP, so NEXT becomes nothing more than a 6-clock RTS instruction. There are certain caveats, but he has a point. To see another idea of his for NEXT see viewtopic.php?t=584 . He has also shown however that STC can be carried out without the memory penalty we usually think of. He gives 9 reasons at viewtopic.php?p=3335 , starting about the middle of the page. That of course eliminates the need for NEXT, nest, and unnest.

What must the assembly look like? The Cross-32 assembler from Universal Cross Assemblers allows you to make up your own processor; so you could start with the 65c02 and modify it to get the additional op codes. Otherwise, perhaps you could get most of what you need with macros.

Note that WDC's 65c02's, which are the only ones being produced today and have no end in sight for the product life, and are much faster than Rockwell's were, have the WAIt and SToP op codes in column B, at $CB and $DB.

Thankyou for bringing the project to our attention.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Dec 12, 2009 12:48 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Thanks for your reply, Garth. I'll try respond to your comments briefly, and hopefully not reveal my ignorance regarding some of the topics if I can help it!

The speed of KK Forth compared with plain FIG Forth would depend on the code mix, of course. @ and ! are somewhat faster. And a higher proportion of short routines would mean NEXT executes comparatively often, so KK's 300+ percent speedup of NEXT would really shine when short definitions prevail. Otherwise the speedup probably isn't all that dramatic [Oops; wrong! See footnote]. It's still a modestly powered, 8-bit micro. It took some inspiration and some perspiration to figure this stuff out and make it work!

I'm sorry not to have much to say about 65816, although I very much admire what seems to be a well-executed upgrade to the family. The truth is I built KK over 20 years ago, and since that time I haven't much kept up with things 65xx. (Do they really have 6502s that'll do 24 MHz now? Caramba!!) KK is so old I could've posted my article in the 6502.org Nostalgia section! The reason I'm writing about it now is to flesh out my new Web site - which I'll take a moment to shamelessly plug, and encourage everyone to visit: https://LaughtonElectronics.com There's also other interesting stuff there, I promise!

Quote:
It makes me wonder how much further we could go with a KK version of the '816.
(LOL!) Well, maybe it's not so funny. I have quite a few more ideas for "smart" registers -- basically Z-Pg memory that knows when to modify itself. That could take the form of an LSI peripheral or even just some IP for a Design Library. When you think about it, 65xx's off-chip "registers" (Z-Pg) offer unique opportunities to create a stronger programming model without disturbing the CPU core.

Quote:
What must the assembly look like? The Cross-32 assembler from Universal Cross Assemblers allows you to make up your own processor
In closing I'll explain that the KimKlone never had to rely on any [hmphh! sniff!] store-bought computer or software to assemble its code! I modified Bill Ragsdale's Forth 6502 assembler so it'd generate 'C02 and KK ops, and used that. Similar story for KK's microcode: this was assembled by some clunky Forth code I wrote, executing on KK's predecessor, my heavily modified KIM-1!

-- Jeff

Edit: I retract my remark. Throughput almost doubles, which is pretty dramatic. :shock: Details in my next post.
Edit: update web-site link. Add a link to my 2013 post about KK's predecessor, my heavily modified KIM-1.


Last edited by Dr Jefyll on Tue Sep 06, 2022 1:48 am, edited 3 times in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Dec 12, 2009 1:01 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8427
Location: Southern California
Quote:
(Do they really have 6502s that'll do 24 MHz now? Caramba!!)

The IP owner of the 65c02, Western Design Center, makes most of their money not selling hardware but rather licensing the IP. They say the '02 core is going into cusom ICs in products at the rate of hundreds of millions of units per year, and at least one of their licensees is running it at over 200MHz.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Dec 12, 2009 4:01 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 9:02 pm
Posts: 1681
Location: Sacramento, CA
I'm very impressed by this design. You obviously spent a lot of time studying instruction execution on a per cycle basis. To think you did this so long ago, and no one else has done so, or at least published the fact.

I like this even better than the hidden DMA cycles discussed last year, or was it two years ago now...

Well Done!!!

Daryl


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 13, 2009 2:28 am 
Offline

Joined: Mon Sep 28, 2009 3:48 am
Posts: 17
Odd but I think I like the 65C02 with your extended addressing better then a 65C816.

Rick


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 13, 2009 10:08 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Another nice thing about extending a 6502 externally is that it lends itself to an FPGA implementation: take one of the 6502 cores out there, and add whatever you need. Doesn't disturb the core much, so not so much verification to do.

I'm tempted to think that borrowing the '816 approach of putting the bank byte out on the data bus is also a win: the FPGA can be on a 5v-friendly 40-pin DIP module, at the cost of an external latch. Then the board design stays in familiar 0.1" pitch, through-hole, 5v territory.

(Edit: Or, use a 48-pin DIP module...)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Dec 14, 2009 7:16 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Still digesting some of the excellent replies here.. and I haven't gotten through the links Garth provided yet. (Thanks for that, btw.)

Re: your question about KK's performance (compared to a stock 'C02) when running FigForth, it seems I did KK an injustice in describing the speedup as undramatic. Now that I do the math I see that throughput rises 89%, based on a code mix with equal occurences of the following: DOCOL SEMIS DOCON DOVAR SWAP OVER @ ! + 0= BRANCH 0BRANCH (LOOP). Sadly, what this really tells us is how woefully NEXT-bound 6502 FigForth is! The routines I just mentioned average out to about 34 clocks; NEXT is 43 !. For a stock 'C02 the time spent in NEXT exceeds the time spent actually computing. So for KK, trimming NEXT to 9 cycles was hugely more important than the (still significant) speedup in @ and !.
Quote:
Note that WDC's 65c02's, which are the only ones being produced today and have no end in sight for the product life, and are much faster than Rockwell's were, have the WAIt and SToP op codes in column B, at $CB and $DB.
Switching to a WDC chip might allow some speedup. As for WAIt and SToP, these use opcodes I've already defined, so there's a conflict. As is, KK would continue to substitute $CB and $DB opcodes and function as before, but it'd be impossible to access the WAIt and SToP functions. I don't care about that, but if I did then I could make space by reducing and rearranging the KK opcodes in column B and column 3. Then KK could "substitute" WAIt and SToP op-codes with themselves - ie, allow $CB and $DB to reach the CPU unchanged. All it would take is a substitution-PROM update and some NOPs in the microcode to get rid of what I put there. But I think I'd prefer to stick with KK's present instruction set, and do without WAIt and SToP.
Quote:
Another nice thing about extending a 6502 externally is that it lends itself to an FPGA implementation [...] Then the board design stays in familiar 0.1" pitch, through-hole, 5v territory.
Not sure where you're going with this, BigEd. You're proposing a product that's hobbyist-friendly? (Unless I'm mistaken, nobody else prefers through-hole) Would it be a KK product? (a KimKlone clone!?) Probably you mean something else. But hey - you wouldn't need to give it The Full Monty... the Smart Register idea could be done without microcode, for instance... There are a range of tradeffs to select from. It could be simpler than KK... or more exotic, a 65816 version, as Garth suggested!

Speaking of 65816, I really have to do some more boning up on that chip. Hopefully when you hear from me next I'll be a little better informed. But I think I agree with kc5tja's remark (from the "6502 with 3-byte addressing" Topic):
Quote:
The 6502/65816 are great CPUs, but they're some 12x to 24x slower than a dedicated Forth CPU of comparable logic density. A stack architecture CPU would be a very different beast.

[Edit: WDC paragraph.]


Last edited by Dr Jefyll on Wed Apr 14, 2010 4:02 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Dec 14, 2009 10:39 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Dr Jefyll wrote:
Quote:
... FPGA implementation [...] Then the board design stays in familiar 0.1" pitch, through-hole, 5v territory.
Not sure where you're going with this, BigEd. You're proposing a product that's hobbyist-friendly? (Unless I'm mistaken, nobody else prefers through-hole)


Sorry, yes, I was following my own train of thought there, concerning an upgrade to an 80's micro. I'm not ready to solder surface mount devices, but I want the flexibility of CPLDs and FPGAs, and I've got a 5v system to build on. I've got a couple of these DIL modules, and they should be suitable to experiment with variations on the 6502 theme.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Dec 14, 2009 11:02 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8427
Location: Southern California
I plan to use my wire-wrap PLCC sockets for the next computer, if I ever get around to building it. They take quite a bit less board space than DIPs, and the 65c02 and '816 have more power and ground pins in PLCC than DIP, and the connections are shorter, making for less inductance, making the part more suitable for higher speeds. Unfortunately I don't find any sources of WW PLCC sockets anymore, but I have my little cache here. They are like conventional thru-hole PLCC sockets but the leads are .025" square posts and come down farther, suitable for wire-wrapping. The pins are arranged like a pin grid array. These are not the huge PLCC-to-DIP adapters. SOICs and PQFPs are pretty easy to solder by hand, but it looks like WDC stopped supplying their parts in PQFP.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 15, 2009 12:15 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Quote:
I was following my own train of thought there, concerning an upgrade to an 80's micro. I'm not ready to solder surface mount devices
Upgrading an 80s micro? Are there posts I can read re: this project?

I too cringe at the thought of building with the tiny SMD packages, but they're available and they have advantages, as Garth mentioned.

I've done well by soldering to these adapter boards by Aries Electronics (see photo) which can THEN plug into a wire-wrap socket. There are SOIC, PLCC, QFP... Still a dicey soldering job, but greatly aided by unimpeded access from all directions. [DigiKey lists them in Product Index -> Prototyping, Fabrication -> Adapters, breakout boards]

-- Jeff


Attachments:
adapter.JPG
adapter.JPG [ 26.05 KiB | Viewed 1503 times ]


Last edited by Dr Jefyll on Tue Sep 06, 2022 12:39 pm, edited 1 time in total.
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 15, 2009 5:16 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8427
Location: Southern California
From old records, I found I got my WW PLCC sockets from Allied, and the manufacturer was MacKenzie Berg. Unfortunately Allied doesn't seem to have them anymore. I've been looking for where to buy them. I found this source: http://smt-adapter.biz/Site/Wire-wrap-sockets.htm# . [Edit, 12 years later: It's gone now, but see the archived page at https://web.archive.org/web/20060619144 ... 4-pins.htm .] The 32-pin is shown there, but if you click on "Selection-guides" underneath the 32-pin one, it will give you a .pdf file, and the 44-pin is indeed there, part number 44PL-W. It's expensive at $9.20, but it may be less than I paid quite a few years ago. Here's a picture of the 32-pin:

Code:
Broken external image link
http://aprilog.com/sites/smt-adapter.biz/illustrations/Photos/32PL-W-F.JPG

As you can see, it's not one of those monstrosities with a big PC board around the outside and all the WW pins outside the perimeter of the actual socket.

Anyway, It looks like if you want to buy some, they're available. [Edit, 1/21/13: Those are no longer available but BigEd just pointed us to a source that stocks them: http://uk.rs-online.com/web/c/?searchTe ... ra=oss&r=t ]

Attachment:
Winslow44PLCCsocket.jpg
Winslow44PLCCsocket.jpg [ 43.24 KiB | Viewed 1575 times ]

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 16, 2009 10:43 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Dr Jefyll wrote:
Upgrading an 80s micro? Are there posts I can read re: this project?

Not yet, but I'll see what I can do. Our 3rd effort is in bring-up right now. We're using BBC Model Bs - I've mentioned the TUBE in a few posts, although right now it's a CPU socket upgrade.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 20, 2009 9:49 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
GARTHWILSON wrote:
While the '02 needs around 40 clocks to do NEXT, the '816 cuts that down about 40%.


On the 65816, a 10 cycle DTC NEXT that plays well with other children (in contrast to the 6 cycle NEXT linked above) is possible:

Code:
INX
INX
JMP (0,X)


In this case, X is IP. With TOS in the accumulator, Y or S could be the data stack pointer. In the latter case, the return stack pointer could be stored on the direct page, leaving Y free to be used/overwritten as needed; e.g. XOR would be:

Code:
EOR 1,S
PLY


The former case would likely use more space, since abs,Y is 3 bytes vs. 2 bytes for stack relative addressing. It also seems natural that the 65816 stack would then be the return stack. XOR would be:

Code:
XOR 0,Y
INY
INY


X, if it were available, would be a more convenient data stack pointer, since INC, DEC, and the shifts are available with direct,X addressing, but not available indexed by Y or S. Keeping the TOS in the accumulator alleviates this considerably, as few words use those operations below the TOS.

Inlining NEXT seems worthwhile, at cost of only 2 bytes.

Since many words take only a few instructions, a 65816 Forth could spend a significant percentage of its time executing next. An extreme case is 2* which is simply an ASL. It takes 2 cycles for the asl, 10 cycles for next. Likewise for 1+ and 1-.

So, for a performance oriented Forth, STC with inlining may well be the way to go. At zero cycles, STC has the fastest possible NEXT. Inlining
means more space in many cases, though there are optimizations possible, especially with branches (which in most cases will be within the -128/+127 range). Assuming TOS in the accumulator, here are some (untested and sorta contrived) examples of sequences that can be optimized (off the top of my head):

Code:
; DUP 0= IF
;
EOR #0
BNE label

; DUP 1234 U< IF
;
CMP #1234
BCS label

; CELLS array-name + @ ( array fetch)
;
ASL
TAY
LDA array_contents,Y

; var-name @ +
;
CLC
ADC var_contents

; 987 OR
;
ORA #987

; DUP var-name !
;
STA var_contents

; DUP >R
;
PHA


Anyway, the point is that it's at least possible to have a 65816 Forth that really flies.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Feb 10, 2010 11:21 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
A few loose ends and some additional remarks on this topic...

Quote:
Have you looked at the timings of your added components to get an idea of how fast the system could run with a WDC 65c02?

Well, it's enjoyable to contemplate, Garth... Assuming the KimKlone's memory chips were also updated, things could be speeded up a LOT! Timing margins are maximal, as there's a pipeline register at the output of the Control Store; access to the Control Store overlaps with execution of the previous micro-word. One minor issue is that the bipolar PROM that aliases 011 op-codes arriving from memory isn't pipelined, so in pursuit of overall higher clock speeds it'd make sense to allow a Wait State (for PROM access) in the minority case of an op-code fetch which returns a 011 code.

Quote:
Odd but I think I like the 65C02 with your extended addressing better then a 65C816.

Thanks, Rick. I have mixed feelings on this point, myself. Since my last post I've spent some time mulling over the '816 data sheet, and have become sufficiently familiar with the chip to comment.

It's safe to say that KK memory addressing is immensely better than run-of-the-mill MMU arrangements which force code and data spaces to coexist within 64K. The KK is also superior to the MOS 6509, even though both feature what an MMU lacks: the all-important ability to switch between full, undivided 64K banks on a bus-cycle by bus-cycle basis. Naturally the '816 also has this ability, but for various reasons the '816 seems a more capable device, overall, than the KimKlone.

65xx coding revolves around indirect pointers in Zero/Direct Page, and of course for 16 MByte addressing the pointers use an extra byte. The '816 features a couple of address modes (Long Indirect and Long Indirect Post-Indexed-Y) which employ three-byte pointers very efficiently (ie: directly from Zero-page), whereas KK suffers a handicap: somewhere upstream of a Long memory access there needs to be a separate instruction, albeit a speedy one, to fetch the most-significant byte of the pointer and place it in a register.

On the other hand, it's the '816 which is at a disadvantage if you happen to need a Long address mode other than those provided. Consider, for example, a Forth implementation which uses the X register to maintain the Parameter Stack. Presumably you want to be able to accept Long addresses on the stack, but when it comes time to code the Long @ and ! words it becomes uncomfortably apparent that the '816 lacks the Long Pre-Indexed-X Indirect mode you require. The workaround is a two-step operation that begins by updating the Data Bank Register -- more or less as KK does! The difference is that KK has a good selection of instructions for doing this, whereas the only way to update the 816's DBR is with a Pull instruction -- a very clumsy maneuver in this case.

(Because of this issue, if I were writing an '816 Forth I would lean heavily toward using SP, not X, as the P-Stack pointer. Another advantage of using SP would be the applicability of Stack Relative Indirect Indexed mode, a boon directly analogous to KimKlone's "X-Indirect-Y" mode. This is a mode that features pre-indexing, indirection and post-indexing.)

On another point, I admire the '816's ability to manage indexed address calculations that result in Bank crossings -- in other words, actual 24-bit addition, built right into the address mode. (KK must explicitly compute such addresses up front, either in Forth or m/c code.) However, I find the '816 data sheet somewhat ambiguous as to which address modes accommodate Bank crossings and which merely wrap around in the low-order 16 bits. (Is there a better reference document available online? What about a simulator? I don't have an '816 to run tests on!)

That sums up KK versus the '816. Briefly I'll mention a third point of comparison, one that I learned about right here on 6502.org. In the Hardware -> 6502 with 3-byte addressing topic, BigEd mentions a design which "... uses page 03 as extra indirection bytes, so that indirect and indirect,Y opcodes take an extra cycle to fetch a byte from &0301 + zp_offset to yield a 24-bit address..." I like this idea, and would probably take a similar tack if I ever had to design KK over again. As BigEd points out,
Quote:
The nice thing about extended pointers is that you can have lots of them: one or two bank registers isn't so convenient.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: