Joined: Fri Aug 30, 2002 1:09 am Posts: 8546 Location: Southern California
|
Quote: I've always admired the 6502's clean design, and after seeing the 65816 with its ugly mode flags, thought there has to be a better way of extending it. After writing my '816 Forth kernel, I have to say it's much easier to program the '816 than the '02 when you're constantly dealing with the calculations of 16-bit values and addresses; but for that, I almost always have the accumulator in 16-bit mode and the index registers in 8-bit. There are very few occurrences of REP & SEP in the whole thing, and when I do use them, I put them in macros named ACCUM8, ACCUM16, INDEX8, and INDEX16, so it's obvious what they're doing and you don't have to remember what REP and SEP mean and what bit positions are what. REP and SEP truly were cryptic, especially since you have to remember the bit positions.
Quote: Can you really do away with 8-bit operations? Very useful for string handling! With memory so cheap these days (what an understatement!), you just take a 32-bit word for each string character. You could use the fast barrel shifter instruction and do a lof of shifting to get four string characters in one word, but I don't see any need. I guess you could use the spare bits to indicate bold, italic, underline, more special and foreign characters, etc. if you wanted to.
For interfacing with 8-bit I/O ICs, you just use the lower 8 bits of the address bus. The '816 usually requires making sure you're in 8-bit mode for whatever register (A, X, or Y) you use to read or write the 6522 because a 16-bit access reads or write two addresses, which is not usually desired in this case. The 32-bit 6502 will read just one address, and you just don't pay attention to the top 24 data bits. I haven't settled even tentatively on whether to just follow it with AND #0000:00FF, use passive pull-downs on the upper 24 bits, or what.
Quote: The other drawback of the '816 is that it is almost, but not quite, able to run each bank as an independent 6502. It falls down for stack and direct page. The usual idiom for looking into the stack by indexing into page 1 doesn't do the trick in this case. I'm not sure what you mean here, but you could have different stacks and direct pages and go between them by changing out the 16-bit S and D registers at the same time that you change the bank numbers for changing tasks. I have to admit that I haven't needed to do this (and therefore haven't done it) with the heavy use of pseudo-multitasking I do all the time with interrupts, but I don't see any problem with it. I'm much more likely to do round-robin coöperative multitasking than preëmptive though. [Edit, 5/15/14: I posted an article on simple methods of doing multitasking without a multitasking OS, at http://wilsonminesco.com/multitask/index.html.]
Quote: 2. Keeping it simpler possibly, a SINGLE mode bit for a 6502 emulation which its own thing, but that would be a kludge for the sake of it. There's always the issue of whether or not to try to preserve the ability to run the legacy code of the earlier-generation processor. To keep from worsening our already-poor chances of getting a new processor like this done, I would vote for not trying to run legacy code on the new processor. (I don't want to sound pessimistic here-- I've seen a lot of "dreaming" threads, and, although everyone learned something from them, most of these threads never turned into actual working hardware, because the ideas quickly got too grandiose for the people doing the dreaming to pull them off.)
Quote: 3. Would it be simpler to implement using the array logic gates or a microcode routine generator for the opcode work in VHDL logic? I think microcode is always slower, and probably has no simplicity benefit in programmable logic. I may be wrong. I know that in several ways here, I'm exposing the fact that I'm definitely not a processor designer.
Quote: 4. (my own personal one): plenty of the addressing modes were based on the memory and logic constraints present at the time. A much simpler set of addressing schemes would be used. When you think about it, how any addressing modes do you use in your programming? Some are used far more than others, but I do use them all. If everything were 32-bit though, it's like everything is in the 4-gigaword direct page. There's no distiction between zero page, absolute, and long. It could probably be made to run faster if the operand were merged with the op code when you only need say 20 or 24 bits and we used deeper pipelining to separate them, but then the design gets more complex again.
Quote: Garth, you want to post your bullet points in here for discussion? I thought you'd never ask. I'll put it below. Actually, I see I've already mentioned a bunch of them, but it'll sumarize them a little more clearly.
Quote: figure it this way, once we can get a solid design spec locked down, we can
1. submit to WDC to implement. 2. One of our members with VHDL experience go about making the bugger... I don't expect #1 to go anywhere, since we've been waiting for many years for the Terbium. WDC seems to be resting comforably just licensing the intellectual property of a very solid 65c02 processor design. OTOH, if someone here does help us get this thing going in programmable logic and it looks really strong, WDC might get interested, or, 6502.org can make it available to anyone who wants it. There's not just hardware but also the matter of especially an assembler to take care of. Eventually some will write simulators and compilers. I'll start by writing a 32-bit Forth kernel, since that's my biggest ambition for this, with my workbench applications in mind.
There was a 6532 RIOT (RAM, I/O, and Timer) IC which I did use on a project years ago, so I need a better name; but until someone comes up with one, you know what I mean. 6532 goals
- 32-bit non-multiplexed address bus, data bus, A, X, Y, PC, S, DP, DBR, PBR (P could be 32-bit but just have a lot of unimplemented bits, or let the programmer use them as desired.) DP, DBR, and PBR are 65816 features I would like to retain for improved capability for relocatable code and multitasking. If the 32-bit address space is thought of as a circle with $FFFF:FFFF being the address right before $0000:0000, then these registers just point to what part of the circle we rotate around to the top for one task or another. (2^32 makes for a 4 gigaword address space, each word being 32 bits. So in essence it will have a 16GB address space, although there will be no operations that are particularly 8-bit.)
- Some addressing modes will be eliminated, because basically everything is in zero page (or, considering DP offset, direct page).
- will not run 6502 code directly. No 8-bit operations, vectors at the $FFFF:FFFF end of the memory map, and op codes don't need to match the 6502's. Making it backward-compatible would dramatically increase the complexity. This will not be a 65832, just a 6532.
- barrel shifter, so shifting right or left by large numbers of bits still happens in one clock
- fast hardware multiplier (maybe three phase-2 cycles) , maybe fast divider too, possibly with a 64-bit register for 32*32 multiplies and 64/32 divides
- outputs for vector pull, op code read, data read or write, dead bus cycles (if we can't eliminate them all-- good for invisible DMA)
- input clock can be a multiple of the bus speed, in order to reduce dead bus cycles and get better control of timing of the things that happen between phase-2 edges (if we even use a "phase-2"), cut MVN and MVP down to 2 clocks per byte moved instead of the 7 the 65816 uses, if we can keep it interruptible.
- DRAM management, perhaps by refreshing rows during dead bus cycles (Nice to have, but low on the list)
So how is it still a 65-family processor?
- same registers as the 65816, just bigger
- nearly the same instructions
- average instruction will take about three phase-2 clocks (6502 takes about four for the average, but needs two clocks instead of one to fetch absolute addresses)
- similar memory-map usage, just on a bigger scale
- phase-2 bus clocking, and one access per phase-2 clock
- excellent interrupt performance, working the same way as the 6502, although possibly with more IRQ inputs and vectors to reduce polling requirements. Each register will take only one cycle for a stack push or pull, including the PC, and a vector pull also only takes one cycle.
What I don't envision it having:
- deep pipelining. Only the shallow pipeline of the 6502 where one instruction often gets finished while the next one is being fetched.
- cache
- branch prediction
- multiplexed buses (address bus and data bus are both 32-bit, and there's no latching required for the high address bits)
Envisioned uses:
- much faster running of high-level languages, with higher-precision default word size (I'm particularly thinking of a 32-bit Forth, with 64-bit intermediate results)
- much faster running of math anywhere you need more than 8 bits of input and output
- faster handling of any data exceeding 8 bits, for example, 12-bit A/D and D/A converters
- much larger data space for tables, files, arrays, etc., without the 65816's bank boundaries. Suitable for audio recordings, possibly video
- multitasking
- some will want the better video, but that's not a goal for me personally
Expected benefits over using existing 32-bit processors:
- easier to learn, easier to code in assembly, easier to count cycles and predict performance, easier to build hardware
- better interrupt performance
Additional instructions I would like:
- STF (store $FFFF:FFFF, to set flag variables)
- Forth's NEXT (very similar to RTS). I believe this is used in some other high-level languages too.
I don't want to go overboard with extra instructions since the more-complex require instruction-decoding logic just means slowing things down.
Samuel recommends clocking on only rising edges, to simplify things in programmable logic. Actually, he has a lot of good recommendations. We'll have to see how much that pulls us away from it being a 6502 with '816 capabilities but just bigger. There are plenty of other processors out there to chose from if it weren't that we want to keep the '02 flavor in both hardware and programming.
When I was talking about the 32-bit bank registers, direct-page resgister (although DP still covers the entire address space), and stack pointer, by the time you consider indexing, it was starting to sound more like a processor with 16 general-purpose registers. Perhaps the transition to something like that becomes more transparent now, at least if the assembler mnemonics are still worked out to be essentially 6502 mnemonics.
_________________ http://WilsonMinesCo.com/ lots of 6502 resources The "second front page" is http://wilsonminesco.com/links.html . What's an additional VIA among friends, anyhow?
Last edited by GARTHWILSON on Fri Jul 17, 2009 4:23 am, edited 1 time in total.
|
|