A Hypothetical C Friendly 6502

Druzyek · Post by **Druzyek** » Mon Nov 30, 2020 2:37 pm

cjs wrote:

Druzyek wrote:

Since I got interested in electronics to build calculators, decimal mode is really useful to me. If we're dreaming up processors, I would definitely keep it in!

Well, do keep in mind that you can just write your own code to do a decimal adjust, as well. Decimal mode or a decimal adjust instruction are just optimizations, and probably not particularly important ones when building a calculator used at human speed.

By the late 1970s, you could type a program a few hundred steps long into a calculator, and by the early 1990s, they had BASIC with a file system and better specs than early home computers. All of the floating point calculations are in decimal, so if you want to do thousands of those per second, decimal mode is a particularly important optimization.

Quote:

For most purposes, I think it's perfectly reasonable to reduce or remove CPU support for decimal arithmetic if CPU resources are very tight.

Agreed. Just pointing out that it's extremely useful for a small number of people which must be why they included it in the first place.

johnwbyrd wrote:

load81 wrote:

The 6502 isn't exactly C friendly.

I dispute this premise. This premise seems to have been universally decided sometime around 1990, when C compilers weren't very wise. This premise continues, principally because cc65 is the most common C compiler available for the target nowadays, and for some reason people assume that cc65's limitations would be general to any C compiler.

I dispute your dispute

cc65 could certainly be improved upon, but just as a thought experiment, it's tough to imagine how you could get good performance out of the 6502 with any C compiler. Think about something like this: while (*ptr) func(*ptr++); A human would find some way to use the Y register to step through memory in that case, but the C compiler usually has no way to know if the data pointed to will be close enough to use the Y register, so it resorts to a 16 bit pointer with no indexing and performance goes way down. Compare that to a system where the registers are large enough to hold a memory pointer, and using a pointer is as fast or faster than indexing, which is one thing that does make C a good fit. The abstraction of C just doesn't provide enough information to the compiler to do things like this efficiently since 8 bit indexes aren't a concept the designers took into account. Can you think of a way llvm would help solve this?

cjs · Post by **cjs** » Mon Nov 30, 2020 3:04 pm

Druzyek wrote:

Think about something like this: while (*ptr) func(*ptr++); A human would find some way to use the Y register to step through memory in that case, but the C compiler usually has no way to know if the data pointed to will be close enough to use the Y register, so it resorts to a 16 bit pointer with no indexing and performance goes way down.

I may well be wrong about this, but wouldn't one deal with this using code something like the following, assuming that ptr is a local variable that is not being passed to func()?

Code: Select all

;   uint8 *ptr = ...
;   while (*ptr)
;       func(*ptr++)

        ;   ...compiler allocates ptr in the zero page...
        ldy #0
loop    lda (ptr),y     ; A ← *ptr
        beq done
        jsr func        ; func(A)
        iny             ; ptr++
        bne loop        ; no carry to MSB
        inc ptr+1       ; carry to MSB; Y is 0 again
        clc
        bcc loop        ; BRA

done    tya             ; Y contains increments not yet added to ptr
        clc
        adc ptr
        sta ptr         ; update pointer with remaining increments
        bcc done1
        inc ptr+1
done1

(Obviously if func() did make use of that pointer itself, either because it was a global or because a pointer to it was passed in, the pointer would have to be updated in the loop every time, rather than the LSB update being delayed until the end of the loop. But that seems to me just the standard "6502 doesn't have 16 bit registers" issue.)

Druzyek · Post by **Druzyek** » Mon Nov 30, 2020 3:56 pm

Quote:

I may well be wrong about this, but wouldn't one deal with this using code something like the following, assuming that ptr is a local variable that is not being passed to func()?

Sure, that's the human generated version. The trick is to get the compiler to output that, which you could do in that simple example. Imagine how you would do this in assembly:

Code: Select all

uint8 *ptr=...
while (*ptr)
{
   if (func(*ptr)) ptr+=5;
   else ptr+=3;
}

The compiler could output essentially the same as you would if you wrote it yourself. On the other hand, if I told you that the while loop will always find a 0 within the first 100 bytes, your code would be smaller and faster. My point is that there is no way to communicate that to the compiler.

Martin_H · Post by **Martin_H** » Mon Nov 30, 2020 4:54 pm

Don't most 6502 machine language monitors use decimal mode to print hex output? I know you could do that with out BCD, but between that and calculators it seems somewhat useful.

cjs · Post by **cjs** » Mon Nov 30, 2020 5:09 pm

Druzyek wrote:

On the other hand, if I told you that the while loop will always find a 0 within the first 100 bytes, your code would be smaller and faster. My point is that there is no way to communicate that to the compiler.

Well, it's been ages since I have had to deal with these kinds of details of C behaviour, and it may even depend which particular version of the C standard you're using, but my instinct is that you do it something like this:

Code: Select all

uint8 ptr[104] = ...
for (uint8 *p = ptr; *p; )
    if (func(*p))
        p += 5;
    else
        p += 3;

Since accessing an array out of its declared bounds is undefined behaviour (at least in ISO C), you're here telling the compiler that you're guaranteeing you'll never do that (or, alternatively, that it can do whatever it likes if you do that), and it can feel free to optimize knowing that if p is ever more than ptr+104, the code is allowed to fail in any arbitrary way. (§6.5.2.1 and §6.5.3.2 in this spec seem to be the relevant bits for the code above.)

But as I said, it's been a long time since I've had to deal with C at this level. And the above is arguably not idiomatic at all, but then again, one could also argue that limiting arrays and pointer arithmetic to 64 KB is not idiomatic, either, and thus the 8086 is just as inefficient for C as the 6502 is. It might be interesting to see how this was dealt with in the various C memory models for the 8086 through the 80286. (It's been even longer since I had to deal with that.)

cjs · Post by **cjs** » Mon Nov 30, 2020 5:45 pm

Martin_H wrote:

Don't most 6502 machine language monitors use decimal mode to print hex output? I know you could do that with out BCD, but between that and calculators it seems somewhat useful.

I can't see why they would. My first rough pass at this is that dealing with A-F in addition to 0-9 involves only one compare and one add (4 bytes, if you're clever with your flags), and turning on and off decimal mode takes two bytes, so it doesn't seem like there's a lot of room for optimization there unless being in decimal mode somehow lets you deal with the 0-9 vs. A-F part in just one byte of code.

Then again, I'm not an expert on 6502 microoptimizations. But I reckon that Woz is, so I went to the Apple II Monitor to see how he did it. The PRBYTE routine (at $FDDA) is:

Code: Select all

PRBYTE  PHA
        LSR A
        LSR A
        LSR A
        LSR A
        JSR PRHEXZ
        PLA
PRHEX   AND #$0F
PRHEXZ  ORA #$B0
        CMP #$BA
        BCC COUT
        ADC #$06
COUT    ...

That's ten bytes to convert a nybble to ASCII, and another 9 to deal with shifting the upper nybble to the lower. As it turns out, this is exactly the same as my naïve 6800 routine, except I use an ADD instead of an OR (which works on the 6800 because I have an add-without-carry instruction—I must remember this OR trick for the 6502!).

If you're wondering why he's ORing and CMPing with $B0 and $BA rather than $30 ('0') and $3A ('9'+1), it's because the Apple II handles ASCII with the high bit always set.

(If there really is some clever way to do this with decimal mode, even if it's not quite as tiny, I'd be interested in seeing it, but that's probably best started in a separate thread.)

Druzyek · Post by **Druzyek** » Mon Nov 30, 2020 6:11 pm

cjs wrote:

Code: Select all

uint8 ptr[104] = ...
for (uint8 *p = ptr; *p; )
    if (func(*p))
        p += 5;
    else
        p += 3;

Since accessing an array out of its declared bounds is undefined behaviour (at least in ISO C), you're here telling the compiler that you're guaranteeing you'll never do that (or, alternatively, that it can do whatever it likes if you do that), and it can feel free to optimize knowing that if p is ever more than ptr+104, the code is allowed to fail in any arbitrary way. (§6.5.2.1 and §6.5.3.2 in this spec seem to be the relevant bits for the code above.)

That might work if we only ever accessed small arrays, but then we wouldn't need pointer arithmetic if we could do everything with indexing. Besides, what if I have an array that is longer than 255 bytes, but I only ever access it in much smaller chunks where I would be resetting my base pointer anyway? For example, imagine the data is read from a file into a buffer larger than 256 bytes but we still know we'll find a 0 in the first 100 bytes once we start looking. Again, human generated assembly would be much faster and the compiler would have no way to optimize.

Quote:

But as I said, it's been a long time since I've had to deal with C at this level. And the above is arguably not idiomatic at all, but then again, one could also argue that limiting arrays and pointer arithmetic to 64 KB is not idiomatic, either, and thus the 8086 is just as inefficient for C as the 6502 is. It might be interesting to see how this was dealt with in the various C memory models for the 8086 through the 80286. (It's been even longer since I had to deal with that.)

I don't know much about 8086, but I would not agree that having that memory limitation makes it as inefficient for C as the 6502. The real measure is the speed difference between what you write in assembly and the equivalent assembly generated by the compiler. It's probably a safe bet that the gap is much narrower on the 8086 through 80286 than on the 6502.

Martin_H · Post by **Martin_H** » Mon Nov 30, 2020 6:58 pm

@cjs, here's the classic 6502 print hex that uses decimal mode.

Code: Select all

; prints the accumulator contents in hex to the console.
printa:
	pha
	lsr
	lsr
	lsr
	lsr
	jsr _print_nybble
	pla
	and #$0f
_print_nybble:
	sed
	clc
	adc #$90	        	; Produce $90-$99 or $00-$05
	adc #$40			; Produce $30-$39 or $41-$46
	cld
	jmp putch

johnwbyrd · Post by **johnwbyrd** » Mon Nov 30, 2020 7:55 pm

Quote:

Think about something like this: while (*ptr) func(*ptr++); A human would find some way to use the Y register to step through memory in that case, but the C compiler usually has no way to know if the data pointed to will be close enough to use the Y register, so it resorts to a 16 bit pointer with no indexing and performance goes way down.

Sure, the C compiler has no way of knowing that, if you don't take the trouble to define your types.

All arguments against C compilers for 65xx take that form. The arguer consistently asks the compiler to codegen some particular beloved optimization.

But the arguer simply never provides enough information, to permit the compiler to legally perform that optimization.

Any C compiler will do the add at the precision and size of ptr, because that is the type you specified for it.

However, consider indexing of the following form:

Code: Select all

  
  unsigned char index = 100;
  char table[101];
  short sum = 0;
  while (index)
    sum += table[index--];

In this case, where you've taken the trouble to tell the compiler that the table's type is char and index's type is unsigned char, then codegen has enough information to use your Y register indexing for stepping through the array.

This sort of pattern-recognition based optimization, is what modern compilers such as llvm and gcc are best at.

Premature optimization is, and will always be, the root of all evil. In addition, it's only sporting to give a C compiler a fair shot at emitting whatever you consider to be optimal code.

Druzyek · Post by **Druzyek** » Mon Nov 30, 2020 8:16 pm

johnwbyrd wrote:

Sure, the C compiler has no way of knowing that, if you don't take the trouble to define your types.

All arguments against C compilers for 65xx take that form. The arguer consistently expects the compiler to break the standard rules of C, in asking the compiler to codegen your particular beloved optimization.

But the arguer simply never provides enough information, to permit the compiler to make those determinations.

Any C compiler will do the add at the precision and size of ptr, because that is the type you specified for it.

The types have to be defined for it to even compile. It's not like leaving the type off will compile but produce less optimized code. If you mean the size of the array then you know that is not always known or even knowable at compile time which is why we have pointers in addition to arrays.

No, not all arguments take that form. I don't expect the compiler to break the C standard, I just don't think you can optimize anywhere near as well as a human while sticking to it - ie C on the 6502 will never be especially efficient.

Quote:

However, consider indexing of the following form:

Code: Select all

  
  unsigned char index = 100;
  char table[101];
  short sum = 0;
  while (index)
    sum += table[index--];

In this case, where you've taken the trouble to tell the compiler that the table's type is char and index's type is unsigned char, then codegen has enough information to use your Y register indexing for stepping through the array.

I don't dispute that there are times when you can put the Y register to use as you have here. What I dispute is that the compiler can always do so as effectively as a human could. In any case, this example is comparing apples to oranges since you're indexing into an array whereas my example was with pointers. What would an ideal C compiler output for this example that I gave above (or what would you add to the C code to get it to do so):

Code: Select all

uint8 *ptr=...
while (*ptr)
{
   if (func(*ptr)) ptr+=5;
   else ptr+=3;
}

cjs · Post by **cjs** » Mon Nov 30, 2020 8:30 pm

Martin_H wrote:

@cjs, here's the classic 6502 print hex that uses decimal mode.

Thanks for that! It seems it does actually save a byte, though on some systems it may involve adding a byte and two cycles to interrupt routines which in the end would reduce overall system performance. (Some here will, perhaps not unfairly, tell me to get with the program and stop using NMOS 6502s. :-)) At any rate, I think at this point those considering removing decimal mode can now make a more informed decision about the cost of that.

Druzyek wrote:

That might work if we only ever accessed small arrays, but then we wouldn't need pointer arithmetic if we could do everything with indexing.

Please look at the code again. I do not use indexing anywhere at all there; I use only pointer arithmetic.

Quote:

Besides, what if I have an array that is longer than 255 bytes, but I only ever access it in much smaller chunks where I would be resetting my base pointer anyway?

Same technique: `buf[1024] = ...; sub_buf[128] = &buf[offset]; ...`. But I think I will leave it there, because I have the strong feeling that someone who really understands how C lets you (or does not let you) describe in this way the limits of the memory you're accessing would be quoting sections of standards relating to arrays and pointer arithmetic to me right now.

Quote:

I don't know much about 8086, but I would not agree that having that memory limitation makes it as inefficient for C as the 6502.

Well, may I suggest you learn more about the 8086 architecture before forming an opinion? Because what you seem to be saying here is, "64K should be enough for anyone."

If that really is what you're trying to say, well, there were a substantial number of programs developed even in the early '80s on "modern" 32-bit systems (such as the VAX-11/780) made by programmers who clearly disagreed with you. Porting these programs to 8086 or 80286, even when they needed less than half a megabyte of memory, was far from easy, as I know from personal experience, and Wikipedia's coverage of the three different pointer formats and six different memory models really only scratches the surface as far as understanding the issues.

Quote:

It's probably a safe bet that the gap is much narrower on the 8086 through 80286 than on the 6502.

No, that is not a safe bet at all. I may never have been a true expert in this area, and I've certainly forgotten most of the detailed knowledge about this that I used to have, but I did spend a fair amount of time and effort back in the early '90s dealing with the issues and it's not so simple as you wish it to be, unless you say that one arbitrary small memory limit is fine another is not.

In short, I'm not seeing that "no bigger than 256" and "no bigger than 65536" are really such different things.

cjs · Post by **cjs** » Mon Nov 30, 2020 9:02 pm

johnwbyrd wrote:

...it's only sporting to give a C compiler a fair shot at emitting whatever you consider to be optimal code.

To be fair, it can require a lot of knowedge and understanding of the intricacies of C to be able to write code that can specify types well enough to get the level of optimization that Druzyek is looking for here. But that is probably as much a complaint about C itself (and the not-always-unfair assumptions made by C coders of any given day and age) as the 6502.

Oh, not to mention that it requires programmers to have a pretty good understanding of what they're actually doing. That has never been a popular idea.

Druzyek wrote:

The types have to be defined for it to even compile.

Yes, but if you define ridiculously broad types, you can't expect in the general case that the compiler will magically figure out that in your particular code narrower types could apply and apply those types for you so it will be allowed to use optimizations valid only for those narrower types.

Druzyek wrote:

If you mean the size of the array then you know that is not always known or even knowable at compile time which is why we have pointers in addition to arrays.

Let me remind you of exactly what you said earlier:

Druzyek wrote:

On the other hand, if I told you that the while loop will always find a 0 within the first 100 bytes, your code would be smaller and faster. My point is that there is no way to communicate that to the compiler.

If you're willing to tell me that you will never exceed a 100 byte range, but not willing to tell the compiler that, you can't blame the lack of optimization valid only for a 100 byte range on the language, the compiler or the CPU. And I am starting to think that your claim that "there is no way to communicate that to the compiler" is not based on a deep understanding of C.

Druzyek wrote:

In any case, this example is comparing apples to oranges since you're indexing into an array whereas my example was with pointers.

This really sounds like, "I refuse to use the facilities supplied by the language that would allow it to optimize my code better."

Look, if you're going to use a compiler (any compiler), you have to be willing to tell it the same things you'd tell an assembly language programmer if you want it to work using the same assumptions as that assembly language programmer.

johnwbyrd · Post by **johnwbyrd** » Mon Nov 30, 2020 9:37 pm

Quote:

What would an ideal C compiler output for this example that I gave above (or what would you add to the C code to get it to do so):

You'd modify the C code so that indexing only ever occurred with an 8 bit index:

Code: Select all

const uint8 *ptr = .....
uint8 index = INDEX_SIZE;
while (ptr[index])
{
   if (func(ptr[index])) index +=5;
   else index +=3;
}

If you want codegen to output Your Favorite Optimization, then you need to inform the compiler what limitations exist on indexing operations on ptr. In this case, informing the compiler that the array is only ever indexed by an 8 bit index, gives the compiler all the hints it needs to perform Your Favorite Optimization.

Quote:

If you're willing to tell me that you will never exceed a 100 byte range, but not willing to tell the compiler that, you can't blame the lack of optimization valid only for a 100 byte range on the language, the compiler or the CPU. And I am starting to think that your claim that "there is no way to communicate that to the compiler" is not based on a deep understanding of C.

cjs is totally accurate here. It's entirely possible to tell a C compiler that an array has a certain size and index type, and a compiler can very well make use of this information during optimization.

I must also point out that inability for any compiler to detect and output Your Favorite Optimization, hardly makes the language "inefficient," by any reasonable definition of the term. It will always be possible, for any implementation of any language, to find a pessimal case for which a practical compiler will not detect and codegen Your Favorite Optimization. Therefore, the correct long-term solution is to implement codegen on an n-pass compiler such as llvm, where you may do custom lowering explicitly for Your Favorite Optimization, and create unit tests for Your Favorite Optimization, regardless of the complexity of said optimization.

Your Favorite Optimization changes from programmer to programmer. The llvm compiler backend is especially good at enabling arbitrary experiments with custom lowering for any particular target. Personally, I think multiplies by constants, lowered to shifts and adds, will be especially fertile ground for 65xx micro-optimizations. But your codegen example, if you took the trouble to rewrite it as I have, would be straightforward to optimize in the llvm backend.

To be even more specific: I imagine that the llvm 6502 backend will do this by detecting that an 8-bit offset is being added to a 16-bit pointer in zero page, followed by a memory access. See also https://blog.yossarian.net/2020/09/19/L ... by-example and https://llvm.org/docs/GetElementPtr.html .

Snial · Post by **Snial** » Tue Dec 01, 2020 1:48 pm

Hi Everyone,

Thanks for all the replies to the idea of the 65T2. Sorry it's taken me so long to reply in turn. I looked at the thread the morning (UK time) after I'd posted it; didn't seem to see any replies at the time and it's taken me a little while to get back to answering. And there's far more at the moment than I can hope to address (with zero page or not

).

A bit about my background: I have written a few posts in 6502.org; namely my initial attempt at writing uxForth (Unexpanded) Forth for the VIC-20 and a prior post on a hi-res game of life sim for the unexpanded VIC-20.

I am certainly not a 6502 guru, though I do have an interest in the CPU and computers. I have a number of actual 6502 computers: an Acorn Atom (I wrote AFAIK the first Macintosh Emulator for it in 1998); An Apple ][; and Apple //c; an Oric-1 and Commodore 64. I have a BBC 32Kb and B+ 128Kb (not a BBC Master).

I have an MPhil in computer architecture from Manchester Uni with the Amulet Research Group headed by Steve Furber (over the same period in 1996-1998). In addition I developed the FIGnition DIY, AVR based Forth computer about 9.5 years ago (I sold about 1000 of them). Amulet worked on asynchronous ARM processors.

I apologise if I have missed out important quotes.

cjs wrote:

(perhaps, moving up to "lower transistor/resource budget than the 6800") might allow for significant architecture improvement worthy of the cost.

Quite possibly. 65T2 predates the OP by a number of years and I just took the subject at face value here. 65T2 is me imagining how the Chuck Peddle team might have designed the 6502 if they'd had another 20 years of knowledge of computer architecture developments. So, to a large degree, the other constraints remain - particularly if they wanted to sell it for $45 in 1976.

Quote:

But I'd like to start with

Quote:

The 6800 supposedly has zero-paged addressing

The 6800 and 6502 are my primary... I don't think you're right here. To be probably overly brief

Being overly brief is the real problem with my paragraph. I didn't mean to claim too much, and some of it ought to be expressed as a hypothesis. It's certainly true that the early 8-bit MPUs tried to gain credence by relating their processors to minicomputer designs. For example, the 6800 was marketed as being similar to the pdp-11 (presumably because it had multiple addressing modes).

In my analysis, the 6800's architecture is mostly similar to minicomputer architectures such as the pdp-8 or HP2100, which themselves were descended from the TX-0, but also designs like the SSEM (Manchester Baby) were similar.

So, although one can categorise processors as being either single word instruction CPUs or multiple word instruction CPUs; I wouldn't normally class them like that, because e.g. one could compress a single-cell instruction set into a set of multi-cell instructions, but I would think of that as the same architecture, but with a different encoding.

But it is true that there has been a corresponding family of multi-cell computers, for example, the IBM 1401 which used 6-bit character based instructions.

As a whole, though most accumulator CPUs had a wide instruction format (e.g. KDF-9), because I guess it made decoding easy and because large mainframes could afford wide busses (wires were cheaper than components and the parallelism would offset the slow speed of the electronics).

With early 8-bit CPUs there was a major constraint on the number of pins, so that even accumulator based designs would be optimised by having multi-cell instructions, and page 0 optimised the limited bandwidth.

Hence, I think of the design of 6800 to be more related to accumulator based wide instruction set computers than e.g. the pdp-11 or IBM 360 or 1401. In which case the zero-page architecturally does relate to zero page or direct addressing on those kinds of machines.

In addition, it would make sense for the designers to take that approach, because the early MPUs weren't designed by people with experience in computer architecture (see oral history of 4004 and 8008), but people with experience in chip layouts, rubylith and the then pioneering MOS fabrication techniques.

In addition, most of the popular computer languages in the development time period for the 6800, 6502, 8080 and the other also-rans; were languages that were oriented around global variables, such as Fortran and BASIC (about 10 years old when the 6800 was released). In addition, the early 8-bit CPUs were expected to be used in embedded applications rather than fully realised computers.

So, I think I stand by the zero-page observation, even though my explanation was poor, but I do so, also because it's helpful in contrasting it with what might be done if one considers the principle of locality.

Modern CPUs are stack oriented, because the principle of locality means that most of the variable accesses in a given function will be a small subset of the complete address space. Thus, if we have n-functions each of which access on average m-variables, where m is a small number, then what we want is to be able to efficiently access n*m variables and if k and j are two of those functions with local variables a and b, we want to be able to alias variable k[a] in j (e.g. by passing their addresses).

Now, we can employ tricks to achieve this with a CPU like the 6502, but that's not the same as saying the 6502 is designed to support those features, which is what the OP is about.

So, ultimately, observations on zero-page addressing are the means I use to be able to critique it in a 6502 alternative designed for 'C'. And I need to focus on that, because zero-page is one of the big cultural selling points of the 6502, so if I want to replace it, I need to have a good reason for doing so - it would be controversial.

Any modification of the 6502 to accommodate 'C' compilers has to do something equivalent, because you have to make the best use of the instruction bandwidth available: you can either bolt more onto the 6502 design (decreasing it's RISC-ish characteristics) or you can remove things you think aren't essential to make room for more efficient design decisions (from a compiler viewpoint).

And that's what I did: I removed all the zero page addressing modes to provide space for direct and indirect frame-pointer addressing modes. Then I tried to make the instruction set more orthogonal which would help compiler targetting.

PS. The 9-th Stack pointer bit is in the Flags register, though on reflection I don't think it should be restored if S is PULled.

Lunch is over, I'll have to add a bit more later. Thanks for reading this.

GARTHWILSON · Post by **GARTHWILSON** » Tue Dec 01, 2020 7:52 pm

Snial, I don't know much about C compilers' innards; but I think you'll want to look into the 816's stack-relative addressing modes and see how you might want to modify or add to them, before eliminating any direct-page instructions. BTW, it allows you, on the fly, to relocate direct page too (which is why it's no longer called 'zero page'), and it doesn't even have to start on a page boundary. One implication is that you can make the direct page overlap the stack area, and have all the DP addressing modes available in the stack area. The stack area is not limited to a single page, since the stack pointer is 16 bits wide.

For example, with the (sr,S),Y addressing mode, you can have the address of an array on the stack, then, all in one instruction, specify how deep the array's address is in the stack (IOW, it doesn't have to be at the top of the stack, nor do you have to pull it off the stack to use it), and then use Y as an 8- or 16-bit offset into that array, to access the desired byte or word in that array, with the array starting in the data bank specified by the data-bank byte. It's triply indexed, quadruply if you count the data-bank register. (In reviewing the topic again after having posted this, it looks like the complaint might be that not all the instructions are available in all the addressing modes. I guess the ones that would be used less could be implemented with the WDM prefix op code.)

(Edited.)

A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502

Re: A Hypothetical C Friendly 6502