6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat May 11, 2024 6:02 pm

All times are UTC




Post new topic Reply to topic  [ 104 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next
Author Message
PostPosted: Sun Apr 30, 2017 9:45 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
Arlet wrote:
Would all these steps be backward compatible ? And how would the opcode space be affected by each change ?

All these changes keep 100% compatibility with the genuine 6502. The only small problem is the fix of zp,X warp but as the fix of 1977 it will not be a problem at all.
Arlet wrote:
Correct, but it's not the same as item 4, which would add an extra register. And presumably, that extra register could be used to produce a (REG), Y addressing mode, because it's meant to replace zeropage.

It is an interesting idea so as (X),Y or (Y,X) but there may be problem with not enough code space.
Arlet wrote:
Would these zp registers also get more operations, such as an ADD with a zp register as destination ?

No, because it will consume code space but RWM instructions like ROR, DEC, ASL will take only 2 cycles instead of 5 and key for 6502 architecture ZP addressing will become 2/3 faster. This would make arithmetic at least 2 times faster. LDA 10 or STX 30 would take 2 cycles instead of 3, ...
EDIT. 6502+ architecture can be easily expanded to 64-bit keeping compatibility with 8-bit software. :D

_________________
my blog about processors


Last edited by litwr on Sun Apr 30, 2017 10:03 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 30, 2017 9:50 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
So, have you thought about the opcode map, and how it would all fit ?


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 30, 2017 10:01 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
Arlet wrote:
So, have you thought about the opcode map, and how it would all fit ?

The promoted changes will require a lot of opcodes for the second accumulator instructions only. 6502 has more than 80 unused opcodes. All accumulator instructions occupy about 70 opcodes - it will be enough. Other free opcodes would be used for work with the mode register (1 or 2 opcodes, to write and read or just to write as at 6309), for multiplication and division, etc.
EDIT. Of course, we need TAB, TBA, XBA. Instructions like eXchange A with Memory, e.g., XMA (zp),Y can occupy 2 byte opcodes. Two and later three byte opcodes can give all extra instructions like CPUID or for the work with additional accumulators and index registers.

_________________
my blog about processors


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 11:03 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
It's an interesting thought experiment, thanks litwr.

I'm sure that backward compatibility is greatly important, as Arlet notes. Indeed, I'd think that even then, you need some stability, so a new product every two years would be a better idea than every year.

It's an interesting idea to bring in just a little of ZP to allow faster pointer dereferences and RMW operations.

Thinking of these as fast ZP, not as registers, might be worth considering. That is, you don't necessarily have instructions to deal with them as registers, and you don't expect to save and load them on interrupts or function calls.

There's another approach, which is to make a small ZP cache. Say, the last 4 ZP locations referenced, or the last 8. It's more complex to design and test, but easier to use. There's a disadvantage: indeterminacy.

I'm sure at some point one would introduce a 16-bit wide memory interface - you do mention that in passing. And then, later, 32-bit wide memory interface. But there's a difficulty in doing that within a 40-pin package.

Some point between 16 and 32 bits, I think you need to introduce a privileged level.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 1:15 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Comparing this design with other CPU designs, I think the zeropage remains a weak point. Even if you can successfully pipeline the (zp), y addressing, you're still stuck with a very limited set of operations that you can do on zeropage.

Unfortunately, I don't see a way to replace zeropage with something more powerful without breaking backward compatibility.

For all its warts, the 8086 turned out to be an excellent future-proof design, allowing massive improvements while still supporting old code.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 1:23 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
Once you add a code cache, you can get more work done on data, and you can tolerate longer instructions, perhaps using prefix bytes like the Z80 and x86 do.

Once you have prefix bytes, you could allow for zp-like indirections on two-byte addresses. I think I prefer that to the idea of a relocatable zero page, which uses a new register for the upper byte.

(When I say two-byte addresses, of course eventually that becomes three-byte or even four-byte addresses.)

I think, today, I prefer prefix bytes to mode bits. But I think they are quite expensive without a code cache, or at least a small prefetch buffer which allows tight loops to run at speed.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 6:15 pm 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
IMHO the hypothetical zp-registers claim to be the functional equivalent of z80 B, C, D, E, H, L-registers which may be combined into pairs BC, DE, HL. I think that instructions POP and PUSH for zp-registers would be very useful, z80 has advantage over 6502 using them.
The prefixes are the good supplement to the mode model. Intel x86(-64) is unthinkable without both. However I am not sure that they are good for 6502+ ISA because there is a fast mode change instruction, it may be used as a prefix.
BTW we need 2 bits only to describe mode: for data 8/16/32/64-bits and for addresses 16/24/32/64-bits. So 6 bits in the mode register will be enough to describe all modes. However I prefer to have 2 more bits describing X and Y index registers separately.
And indeed I want to use (zp,X),Y addressing with later 6502+. :)

_________________
my blog about processors


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 6:23 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
Have you experience using the '816? It has two width modes, one for data accesses and one for indexing, IIRC. It's not easy to get started with, and there are a whole load of subtleties to what happens.

It's kind of interesting within its constraints, but I find I prefer a machine with wide registers, where the operations apply to their appropriate data width. The difficulty with that, for a 6502+, is that we started with 8 bit registers, so we're forced to have a mode to adjust the width of the machine. But, for me, best to make that a uniform width change: 8/16/32/64 for all registers. And then use prefix bytes or modifier bytes to set the width of individual operations.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 6:36 pm 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
Arlet wrote:
Even if you can successfully pipeline the (zp), y addressing, you're still stuck with a very limited set of operations that you can do on zeropage.

You may also consider zp-registers as dedicated address registers of 680x0 which has limited set of operations for them. Zp-registers give us a lot of fast pointers. Of course, GPR are better but a-lot of zp-register may outperform small number of GPR.
Later models of 6502+ with wide data bus and proper pipelining would have any operations for zp-registers implemented by multi-byte opcodes.
BigEd wrote:
Have you experience using the '816? It has two width modes, one for data accesses and one for indexing, IIRC. It's not easy to get started with, and there are a whole load of subtleties to what happens.

It's kind of interesting within its constraints, but I find I prefer a machine with wide registers, where the operations apply to their appropriate data width. The difficulty with that, for a 6502+, is that we started with 8 bit registers, so we're forced to have a mode to adjust the width of the machine. But, for me, best to make that a uniform width change: 8/16/32/64 for all registers. And then use prefix bytes or modifier bytes to set the width of individual operations.

All these subtleties are the same as with Intel x86 ISA. IMHO 6502+ shows cheaper and faster design. Is it perfect? No.
Who uses assembler to program of modern CPU? Just several geeks, compiler and OS coders. All mentioned problems are solved by compilers.

_________________
my blog about processors


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 6:42 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
Compilers... that's a good point. But do you think they work well with lots of modes? I'd think they would set-and-forget, making some single simple choice.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 6:52 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
litwr wrote:
You may also consider zp-registers as dedicated address registers of 680x0 which has limited set of operations for them.

True, but at least they had adda, suba, cmpa, lea, and movem. Also, the addressing modes allowed both index register and immediate offset, as well as auto inc/dec.

Quote:
All these subtleties are the same as with Intel x86 ISA

I think the x86 is much easier to keep track of. For 8 vs 16, it had the 'w' bit encoded in the instruction, making it easy to switch back and forth. And the '386 has a default mode (which you should set once), plus a prefix per instruction.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 7:31 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
litwr wrote:
It is possible to dream about the development of our beloved 6502 in the case that MOS Technology continued independently. Let's call these improved 6502 the 6502+. I can suggest the next steps which could keep 6502+ leadership up to the end of the 80s.

1) 1976 - the elimination of zp,X page warp. This step would allow the direct stack addressing.
Nice, but this usage would cost +1 cycle :) - using abs+x (available) yield the same speed at the expence of one more program byte.
Adding an offs,S address mode would be more versatile IMHO.

litwr wrote:
2) 1977 - the elimination of empty cycles. This would make 6502+ 25% faster. This was done with 4510 but only in 1988. This step might also include the addition of several useful minor instructions like BIT#.
Oh yes, sadly lots of programs needs to be "adjusted" due to incorrect timing loops. And I'm sure, allthough it looks simple, it takes a huge amount of silicon to achieve this.

litwr wrote:
3) 1978 - the addition of the second accumulator. This would make 6502+ up to 100% faster.
Hmm, this is something I don't understand. Perhaps you can explain this dramatic speedup. But I assume adding 9 accumulators instaed of only one wouldn't yield +900% ??

litwr wrote:
4) 1979 - moving part of the zero page short addresses into registers. It might be 4 or 8 bytes, for example, starting at $80. The long (2 bytes) address to zp would be used for an access to zp RAM. This would increase 6502+ speed up to 50%.
Again I require additional explaination - you mean accessing $80 will use the internal "register", and accessing $0080 will use external RAM? So they could have different contents? If so, that means you are using zeropage addressing as a flag to distinguish between external and internal memory. OK, then LDA ($80),Y will fetch two bytes (opcode and zpAddr) but then use reg80,reg81 add Y and finally fetch the data. Ideally this would take only 3 cycles instead of 5. In- or decrementing these "registers" would then again require 2 byte program data and ideally 2 cycles (more likely 3 cycles, since you have to fetch, change, and restore the register) saving 3 or 2 cycles.
Hee hee - if register <> memory you could fill a program into $0080++ and run it, using regs$80++ having a different contents. Nicely odd :D
litwr wrote:
5) 1980 - more zp to registers and the addition of byte by byte multiplication instruction like at 6809.
Yes, nice.

litwr wrote:
6) 1981 - the support for 16 MB memory addressing with the introduction of an mode register and an address mode bit in it. 6502+ would fetch 3 bytes for any address in 16 MB mode and 2 bytes in the old 64 KB mode.
...
IMHO this architecture would be faster than even mighty ARM or Intel 80486 at 1991.
Well, IMHO that's beyond the scope of 65xx.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 7:48 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
litwr wrote:
IMHO the hypothetical zp-registers claim to be the functional equivalent of z80 B, C, D, E, H, L-registers which may be combined into pairs BC, DE, HL. I think that instructions POP and PUSH for zp-registers would be very useful, z80 has advantage over 6502 using them.
The prefixes are the good supplement to the mode model. Intel x86(-64) is unthinkable without both. However I am not sure that they are good for 6502+ ISA because there is a fast mode change instruction, it may be used as a prefix.
BTW we need 2 bits only to describe mode: for data 8/16/32/64-bits and for addresses 16/24/32/64-bits. So 6 bits in the mode register will be enough to describe all modes. However I prefer to have 2 more bits describing X and Y index registers separately.
And indeed I want to use (zp,X),Y addressing with later 6502+. :)

The advantage of having lots of "pointers" placed in zeropage is only that you can/should avoid saving and restoring them - that is exactly what you need to do when dealing with a Z80. There you easily ran out of registers especially as they are not freely interchangeable.
6 mode bits! Meaning 2^6 = 64 possible situations regarding register lengths. Do you played in your mind with situations like context switches or simply interrupts? I just playing with the 816 and realize that I have to save A and X and a second time P (flags) and perhaps DPR. Then I could switch the register size to what I need within the IRQ service. There I have to take care of DPR or using long addressing. Finally I have do undo everything => PLD, PLP, PLX, PLA, RTI :shock:


Top
 Profile  
Reply with quote  
PostPosted: Mon May 01, 2017 8:24 pm 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California
litwr wrote:
Who uses assembler to program of modern CPU?

I maintain however that part of the reason for this is that modern CPUs are so difficult to program in assembly language. I still find it a strange situation, where the very things that make these processors more compiler-friendly also make them so unfriendly to assembly language. Part of the purpose of compilers of course is portability, so you don't have to learn a new assembly language for each processor. For the '02 however, the cc65 C compiler produces extremely inefficient code. I'm sure a better C compiler could be written for the 6502[*]; but the '02 seems to have a far greater "compiler penalty" than most newer processors do. OTOH, the '816 is undoubtedly a little better for compilers yet I find it easier to program in assembly than the '02.

Ed has listed the links to the many discussions about extending the '02, at viewtopic.php?f=1&t=4216 .

I have a list of links to various more-powerful versions of the 6502 at http://wilsonminesco.com/links.html#65fam . A couple of them did make it to market. Of the newer 32-bit efforts, I think Michael Barry's 65m32 is the closest to becoming reality. Jeff Laughton (forum name Dr Jefyll) did an impressive extension many years ago using a lot of logic ICs to help the 65c02, in his KimKlone 6502 w/ pointer-arithmetic-friendly extended address space and 9-cycle ITC Forth NEXT, described here. I don't think he's using it much anymore but he did for years.

[*] Edit, 2/14/21: I just came across this page about benchmarking the various C compilers for the 6502. CC65 produced much slower, more bloated code than the other C options, although it was more solid.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 12:07 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
Thanks for the mention, Garth.

Prefix bytes are a potent solution because each consumes just a single place in the opcode table but it makes possible 256 new encodings. Unfortunately, the prefix occupies an extra byte of memory. Also, fetching that byte entails a certain delay.

So, there's a tradeoff. A single-byte opcode is fastest, but a prefix-plus-opcode combination is more versatile because more encodings are possible.

KK implements both. The prefix method is the most flexible way to specify a "Far" data access — that is, one whose data has a bank address that's independent of the bank where the currently-executing code resides (per K0). The desired prefix (either K1 prefix, K2 prefix or K3 prefix) is followed by and acts upon any typical 65C02 data-access instruction such as BIT Abs, INC Abs, CMP (Ind) and so on. During that following instruction the 'C02 creates a 16-bit address in more or less the usual fashion, and KK makes the alternative bank active only during the instruction's data access — ie, typically for one cycle only :!: (three cycles for Read-Modify-Write).

The other method involves only LDA and STA. These are used more frequently than other data-access instructions, and to improve performance KK sometimes allows them to omit the prefix. In order to conserve encodings this capability is limited to a few carefully selected cases.
  • opcodes $D3, $F3 and $E3 apply K2 to LDA Abs, LDA (Ind,X) and LDA (Ind),Y respectively
  • opcodes $93, $B3 and $A3 apply K1 to STA Abs, STA (Ind,X) and STA (Ind),Y respectively

In other words $D3 (for example) acts as "2 in1"— ie, both the prefix and the opcode itself. The mnemonic is LDA_K2. Altogether (including the two operand bytes for Abs mode) the instruction occupies 3 bytes, and it executes in four cycles, with register K2 supplying the bank address only for the last cycle.


Attachments:
KK Register Diagram.png
KK Register Diagram.png [ 8.39 KiB | Viewed 589 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Tue May 02, 2017 2:18 pm, edited 1 time in total.
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 104 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next

All times are UTC


Who is online

Users browsing this forum: GARTHWILSON and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: