cjs wrote:
Perhaps it's just me, but this is feeling really not at all like a 6502 any more. I know there's no bright line between what does and what doesn't have "6502-nature," but (to my mind, anyway) moving 32- and 64-bit arithmetic into the CPU and switching to memory-to-memory operations seems to be well on the far side of it, and also my guess is that such a CPU would be many times the cost to build of the original 6502.
Certainly we would agree that the 6502 can do these things in software. My point is that the llvm backend will make heavy use of this type of operation. My goal is not necessarily to make that code run faster -- I'm not sure that it can, given that 8-bit accumulator, which is the alpha and the omega of the 6502 -- but rather to make the code as
small as possible.
Quote:
What you've described above is also clearly a separate topic from the 65T2, since it (as far as I can tell) has no hope of coming anywhere near complying with the "approximately same transistor budget as the original MOS 6502" constraint, but I would be interested in hearing any ideas you have that would comply with that constraint while improving things for you as a compiler writer.
Well, let's take your restriction as a given. In that case, I could use a set of instructions that perform more or less the following operations:
Code:
ldy #argumentlength ;; set y to the length of each operand
loop:
lda zp1, x ;; load a byte from the first operand
mathop zp2, x ;; do the math operation between the first and second operand; this is add, subtract, compare, or, and, etc.
sta zp3, x ;; store the result in the third operand
inx ;; go for next byte
dey ;; y decreases for each byte of length of each operand
bne loop ;; repeat until y equals zero, at which point we have processed each byte in the operands
Now there are several ways I can represent this instruction sequence succinctly. I can write a small interpreter, that reads the opcode and the operands as I described them in my previous post. However, I am hoping (though not assuming) that the hardware can, partially or completely, decode this set of operations from a concise representation. And it's above my skill to determine whether this can be done in very few transistors.
As usual, Woz was completely right, way back in 1977, when he wrote the following: "While writing Apple BASIC for a 6502 microprocessor I repeatedly encountered a variant of Murphy's Law. Briefly stated, any routine operating on 16 bit data will require at least twice the code that it should. Programs making extensive use of 16 bit pointers (such as compilers, editors and assemblers) are included in this category. In my case, even the addition of a few double byte instructions to the 6502 would have only slightly alleviated the problem. What I really needed was a hybrid of the MOS Technology 6502 and RCA 1800 architectures, a powerful 8 bit data handler complemented by an easy to use processor with an abundance of 16 bit registers and excellent pointer capability." Personally I feel the 65T2 solution acknowledges the first of Woz's problems, but it ignores the second. This is not a deal breaker by any means; I just feel that perhaps the concept is focused on optimizing the wrong thing.
Historically, the processor designer that took Woz's words most to heart, was Sophie Wilson, a key designer for the ARM microarchitecture. (Sophie was at the time known as Roger Wilson.)
https://en.wikipedia.org/wiki/Sophie_Wilson . Sophie provided one of the first CPU cores with a big pile of registers. Wilson was exceptionally aware of the 6502's design; she spent a great deal of time at WDC before helping to design the ARM. The ARM1 ran about 25000 gates. As a side note, it might be an interesting exercise to see how many ideas from the original 1985 design, might be backported trivially to a 65xx.
http://www.righto.com/2015/12/reverse-e ... or-of.htmlLastly, realize that this whole discussion sidesteps a critical question, which is exactly what constitutes efficiency or "friendliness." This conversation seems to assume that the principal kind of efficiency to be sought is in the number of cycles saved. In practice, for any medium-sized program, the 6502's 16-bit memory limit becomes incredibly important incredibly quickly. Especially given the 6502's limited memory, a compiler author needs to consider the speed optimization case as well as the size optimization case, when considering whether a feature is language-friendly or not.