Joined: Fri Aug 30, 2002 1:09 am Posts: 8546 Location: Southern California
|
Quote: I've just realised that 65Org32 differs from 6502 crucially in that addresses are just one 32-bit byte which is why there are no zero-page limitations. Now you can use any address for indirects, without having to be on a tight budget for ZP (or DP) use, because all of memory-- all 4,294,967,296 addresses-- are in ZP. Page 1 where the hardware stack resides could also be seen as having 4,294,967,296 addresses (although they're the same ones), so your stack depth is unlimited. Ruud wanted to be able to put long strings and other large things on the hardware stack for his Pascal compiler he was writing. Now he could put a whole movie on the stack if he wanted to!
Quote: This is all good from a cycle count perspective, but changes the implementation quite a bit in a nice way, huh?! It's simpler.
Quote: On the implementation front, the first thing to check might be the relative speeds of an 8-bit and a 32-bit ALU. As we know, 6502 is almost entirely 8-bit and takes extra cycles to carry into the high byte. Yes, and it has especially been a pain on the 6502 when incrementing a multi-byte number by say 2 when you don't know if the beginning number will be even or odd. I can think of several ways to do it, but none get much better than just doing a quad-precision "add 2":
Code: LDA ADDR CLC ADC #2 STA ADDR LDA ADDR+1 ADC #0 STA ADDR+1 LDA ADDR+2 ADC #0 STA ADDR+2 LDA ADDR+3 ADC #0 STA ADDR+3
which is nowhere near as good as the 32-bitter's
Code: INC ADDR INC ADDR
or, per your earlier suggestion, simply
Code: INC2 ADDR
However, this might be a situation where we don't want to compare apples to apples. The 6502 must do the multiple INCs or DECs in order to move to the next pair of bytes holding an address, which is the typical use for this procedure. For the 32-bitter OTOH, a single INC or DEC is sufficient to move to the next memory unit containing a complete address.
Quote: Note also that the ROMs would have to be word-wide, unless there was a byte-wide bootstrap mode. Yes, I was thinking of four 8-bit-wide ones, or two 16-bit-wide ones, or, something I've been wanting to do even with the 6502 or '816 is to have something pre-load RAM with the boot-up code before releasing the processor from RST. Then you don't need any slow ROM on the bus at all.
Quote: It might be worth collecting and ranking the various ideas which enhance the 6502 but increase the complexity somewhat. Complexity costs: in time to implement, to verify, to document and to add into the toolchain. For some things, we'll have to discuss it with whoever does the HDL design, to evaluate tradeoffs. For the "toolchain" I was thinking of using Universal Cross Assemblers' C32 assembler which comes with the files to assemble for dozens of different processors (no need to buy another assembler for each) and gives you the tools to write files for new processors. Once I have my Forth kernel going with that, I'll write an assembler in Forth that will run on the target like I have on my current system except that it will be more powerful.
Quote: Off the top of my head, for 65Org32, I quite like - predication (every instruction can be conditionally executed)
Can you give examples and show what makes them useful.
Quote: - including branch offsets in the opcode
- other small constant signed operands included in the opcode
- including an ASL8 and maybe an ASL24 or ASL16
This would make for more-compact code, and would especially be appropriate for shift and rotate distances (which covers one of the next things you mention), although I expect it might make for more-complex and maybe slower instruction-decoding. The usual solution is to having a deep pipeline to be working on the decoding before the instruction's turn comes up to be executed, but then we're talking about more complexity again. OTOH, the input clock could be four or eight times the bus clock in order to get more ticks to accomplish things for each memory cycle, like having a 40 or 80MHz input clock for a 10MHz phase-2 output. This should eliminate the dead bus cycles which you also mentioned later.
Quote: - 8x8 multiply (or 32x32 if on an FPGA with hardware multipliers)
For my uses, I really want the 32x32, which is where the B accumulator comes in which you also mentioned. It wouldn't bother me too much if the processor had to be in two or three ICs instead of all fitting into one.
Quote: and I also like, but would defer: - stack-based address mode
- RL register so all accesses are relative: free relocation
The 65816 already has these, except that its "relative" part is the program bank and data bank registers which I would want to extend to 32-bit so there's no constraint to 64K blocks. The offsets can be any 32-bit number. As with the '816, you're not forced to use them though. Absolute addressing uses them but long does not. (Both abs and long will be 32-bit here though.) The 816's DP register gives another offset for the movable zero page within the first 64K bank; but now that too will be free to be anywhere in the 32-bit address space.
The stack-based address mode, if I'm understanding you right, is called "stack-relative," as in the "CMP 2,S" in one of my listings on the last page, which means "compare accum to what is 2nd from the top of the stack," regardless of where the top of the stack is at the moment.
Quote: The 65CE02 had this, and it was initialized at 0 at RST. If we want another register for miscellaneous use, it would probably be good to differentiate it from a constant zero so there's no confusion if you still want to store zero (STZ, which the 65c02 and '816 have) versus store the register's content.
Quote: Yes, unless we find that the relative-address registers mentioned two paragraphs up make it unnecessary. The '816 can jerry-rig this far more easily than the 6502 can, but it still lacks improvement.
Quote: - putting a 16 or 24-bit address into an opcode for a new sort of zero-page
- barrel shifter
- prefetch buffer
discussed above
Quote: - top of stack held on chip
I see advantages and disadvantages. Can you elaborate?
Quote: getting more complex again, and makes it hard to pre-determine peformance.
Quote: The '816 has JSR (addr,X), and adding JSR(addr) might make sense. As it is now, it can be synthesized for the few times it's needed, with PER followed by a JMP indirect. Pretty simple.
Quote: Absolutely, as the '816 already has it. The '816 takes 7 clocks per byte though, and I think we can do better, at least if the input clock is a multiple of the phase-2 output.
Quote: - B and maybe C reg as a push-down stack: just need a POP or ROT, implicit push on every LDA. Maybe a DUP.
Do you have a particular use for a HLL in mind? If the processor still has to go out on the bus to update the stack in memory to reflect what's in its registers, or refill its registers from memory, won't there be a performance hit that kind of negates the benefit?
Quote: For a memory system which isn't simple, there are questions about [...] supporting self-modifying code [...] I'd defer all that. I do use self-modifying code in my ITC Forth's inner loop to do a double indirect more efficiently, making the operand to also be a variable. Keeping the operand separate from the op code helps immensely there. I suspect other HLLs do the same kind of thing. This is one of the reasons to stay away from the separate instruction and data memories of a Harvard architecture.
I got permission from Phil Pemberton, the owner of the 6502 Yahoo forum, to put an invitation there to have others join in this discussion since there has been virtually no activity on that forum recently (8 posts in the last 3 years). Hopefully we'll attract a couple of HDL designers who can help us turn the talk into hardware. I'll be posting the invitation soon.
Although it's not a 65-family processor, take a look at the home-made TTL CPU project at http://web.whosting.ch/dieter/trex/trex.htm . The link was sent by someone who would rather lurk and not post. The CPU occupies multiple 6U VME form factor boards, with mezzanines. Very impressive and nicely done, although never quite finished, and not what we're looking to do here. Edit, 7/24/12: That's Dieter, and Mike just posted his updates at http://6502.org/users/dieter/index.htm . There is a section there on the 6502 ALU.
_________________ http://WilsonMinesCo.com/ lots of 6502 resources The "second front page" is http://wilsonminesco.com/links.html . What's an additional VIA among friends, anyhow?
|
|