A Hypothetical C Friendly 6502
Re: A Hypothetical C Friendly 6502
cjs wrote:
Druzyek wrote:
Since I got interested in electronics to build calculators, decimal mode is really useful to me. If we're dreaming up processors, I would definitely keep it in!
Quote:
For most purposes, I think it's perfectly reasonable to reduce or remove CPU support for decimal arithmetic if CPU resources are very tight.
johnwbyrd wrote:
load81 wrote:
The 6502 isn't exactly C friendly.
Re: A Hypothetical C Friendly 6502
Druzyek wrote:
Think about something like this: while (*ptr) func(*ptr++); A human would find some way to use the Y register to step through memory in that case, but the C compiler usually has no way to know if the data pointed to will be close enough to use the Y register, so it resorts to a 16 bit pointer with no indexing and performance goes way down.
Code: Select all
; uint8 *ptr = ...
; while (*ptr)
; func(*ptr++)
; ...compiler allocates ptr in the zero page...
ldy #0
loop lda (ptr),y ; A ← *ptr
beq done
jsr func ; func(A)
iny ; ptr++
bne loop ; no carry to MSB
inc ptr+1 ; carry to MSB; Y is 0 again
clc
bcc loop ; BRA
done tya ; Y contains increments not yet added to ptr
clc
adc ptr
sta ptr ; update pointer with remaining increments
bcc done1
inc ptr+1
done1
Curt J. Sampson - github.com/0cjs
Re: A Hypothetical C Friendly 6502
Quote:
I may well be wrong about this, but wouldn't one deal with this using code something like the following, assuming that ptr is a local variable that is not being passed to func()?
Code: Select all
uint8 *ptr=...
while (*ptr)
{
if (func(*ptr)) ptr+=5;
else ptr+=3;
}
Re: A Hypothetical C Friendly 6502
Don't most 6502 machine language monitors use decimal mode to print hex output? I know you could do that with out BCD, but between that and calculators it seems somewhat useful.
Re: A Hypothetical C Friendly 6502
Druzyek wrote:
On the other hand, if I told you that the while loop will always find a 0 within the first 100 bytes, your code would be smaller and faster. My point is that there is no way to communicate that to the compiler.
Code: Select all
uint8 ptr[104] = ...
for (uint8 *p = ptr; *p; )
if (func(*p))
p += 5;
else
p += 3;
But as I said, it's been a long time since I've had to deal with C at this level. And the above is arguably not idiomatic at all, but then again, one could also argue that limiting arrays and pointer arithmetic to 64 KB is not idiomatic, either, and thus the 8086 is just as inefficient for C as the 6502 is. It might be interesting to see how this was dealt with in the various C memory models for the 8086 through the 80286. (It's been even longer since I had to deal with that.)
Curt J. Sampson - github.com/0cjs
Re: A Hypothetical C Friendly 6502
Martin_H wrote:
Don't most 6502 machine language monitors use decimal mode to print hex output? I know you could do that with out BCD, but between that and calculators it seems somewhat useful.
Then again, I'm not an expert on 6502 microoptimizations. But I reckon that Woz is, so I went to the Apple II Monitor to see how he did it. The PRBYTE routine (at $FDDA) is:
Code: Select all
PRBYTE PHA
LSR A
LSR A
LSR A
LSR A
JSR PRHEXZ
PLA
PRHEX AND #$0F
PRHEXZ ORA #$B0
CMP #$BA
BCC COUT
ADC #$06
COUT ...
If you're wondering why he's ORing and CMPing with $B0 and $BA rather than $30 ('0') and $3A ('9'+1), it's because the Apple II handles ASCII with the high bit always set.
(If there really is some clever way to do this with decimal mode, even if it's not quite as tiny, I'd be interested in seeing it, but that's probably best started in a separate thread.)
Curt J. Sampson - github.com/0cjs
Re: A Hypothetical C Friendly 6502
cjs wrote:
Code: Select all
uint8 ptr[104] = ...
for (uint8 *p = ptr; *p; )
if (func(*p))
p += 5;
else
p += 3;
Quote:
But as I said, it's been a long time since I've had to deal with C at this level. And the above is arguably not idiomatic at all, but then again, one could also argue that limiting arrays and pointer arithmetic to 64 KB is not idiomatic, either, and thus the 8086 is just as inefficient for C as the 6502 is. It might be interesting to see how this was dealt with in the various C memory models for the 8086 through the 80286. (It's been even longer since I had to deal with that.)
Re: A Hypothetical C Friendly 6502
@cjs, here's the classic 6502 print hex that uses decimal mode.
Code: Select all
; prints the accumulator contents in hex to the console.
printa:
pha
lsr
lsr
lsr
lsr
jsr _print_nybble
pla
and #$0f
_print_nybble:
sed
clc
adc #$90 ; Produce $90-$99 or $00-$05
adc #$40 ; Produce $30-$39 or $41-$46
cld
jmp putch
Re: A Hypothetical C Friendly 6502
Quote:
Think about something like this: while (*ptr) func(*ptr++); A human would find some way to use the Y register to step through memory in that case, but the C compiler usually has no way to know if the data pointed to will be close enough to use the Y register, so it resorts to a 16 bit pointer with no indexing and performance goes way down.
All arguments against C compilers for 65xx take that form. The arguer consistently asks the compiler to codegen some particular beloved optimization.
But the arguer simply never provides enough information, to permit the compiler to legally perform that optimization.
Any C compiler will do the add at the precision and size of ptr, because that is the type you specified for it.
However, consider indexing of the following form:
Code: Select all
unsigned char index = 100;
char table[101];
short sum = 0;
while (index)
sum += table[index--];
This sort of pattern-recognition based optimization, is what modern compilers such as llvm and gcc are best at.
Premature optimization is, and will always be, the root of all evil. In addition, it's only sporting to give a C compiler a fair shot at emitting whatever you consider to be optimal code.
Re: A Hypothetical C Friendly 6502
johnwbyrd wrote:
Sure, the C compiler has no way of knowing that, if you don't take the trouble to define your types.
All arguments against C compilers for 65xx take that form. The arguer consistently expects the compiler to break the standard rules of C, in asking the compiler to codegen your particular beloved optimization.
But the arguer simply never provides enough information, to permit the compiler to make those determinations.
Any C compiler will do the add at the precision and size of ptr, because that is the type you specified for it.
All arguments against C compilers for 65xx take that form. The arguer consistently expects the compiler to break the standard rules of C, in asking the compiler to codegen your particular beloved optimization.
But the arguer simply never provides enough information, to permit the compiler to make those determinations.
Any C compiler will do the add at the precision and size of ptr, because that is the type you specified for it.
No, not all arguments take that form. I don't expect the compiler to break the C standard, I just don't think you can optimize anywhere near as well as a human while sticking to it - ie C on the 6502 will never be especially efficient.
Quote:
However, consider indexing of the following form:
In this case, where you've taken the trouble to tell the compiler that the table's type is char and index's type is unsigned char, then codegen has enough information to use your Y register indexing for stepping through the array.
Code: Select all
unsigned char index = 100;
char table[101];
short sum = 0;
while (index)
sum += table[index--];
Code: Select all
uint8 *ptr=...
while (*ptr)
{
if (func(*ptr)) ptr+=5;
else ptr+=3;
}Re: A Hypothetical C Friendly 6502
Martin_H wrote:
@cjs, here's the classic 6502 print hex that uses decimal mode.
Druzyek wrote:
That might work if we only ever accessed small arrays, but then we wouldn't need pointer arithmetic if we could do everything with indexing.
Quote:
Besides, what if I have an array that is longer than 255 bytes, but I only ever access it in much smaller chunks where I would be resetting my base pointer anyway?
Quote:
I don't know much about 8086, but I would not agree that having that memory limitation makes it as inefficient for C as the 6502.
If that really is what you're trying to say, well, there were a substantial number of programs developed even in the early '80s on "modern" 32-bit systems (such as the VAX-11/780) made by programmers who clearly disagreed with you. Porting these programs to 8086 or 80286, even when they needed less than half a megabyte of memory, was far from easy, as I know from personal experience, and Wikipedia's coverage of the three different pointer formats and six different memory models really only scratches the surface as far as understanding the issues.
Quote:
It's probably a safe bet that the gap is much narrower on the 8086 through 80286 than on the 6502.
In short, I'm not seeing that "no bigger than 256" and "no bigger than 65536" are really such different things.
Curt J. Sampson - github.com/0cjs
Re: A Hypothetical C Friendly 6502
johnwbyrd wrote:
...it's only sporting to give a C compiler a fair shot at emitting whatever you consider to be optimal code.
Oh, not to mention that it requires programmers to have a pretty good understanding of what they're actually doing. That has never been a popular idea.
Druzyek wrote:
The types have to be defined for it to even compile.
Druzyek wrote:
If you mean the size of the array then you know that is not always known or even knowable at compile time which is why we have pointers in addition to arrays.
Druzyek wrote:
On the other hand, if I told you that the while loop will always find a 0 within the first 100 bytes, your code would be smaller and faster. My point is that there is no way to communicate that to the compiler.
Druzyek wrote:
In any case, this example is comparing apples to oranges since you're indexing into an array whereas my example was with pointers.
Look, if you're going to use a compiler (any compiler), you have to be willing to tell it the same things you'd tell an assembly language programmer if you want it to work using the same assumptions as that assembly language programmer.
Curt J. Sampson - github.com/0cjs
Re: A Hypothetical C Friendly 6502
Quote:
What would an ideal C compiler output for this example that I gave above (or what would you add to the C code to get it to do so):
Code: Select all
const uint8 *ptr = .....
uint8 index = INDEX_SIZE;
while (ptr[index])
{
if (func(ptr[index])) index +=5;
else index +=3;
}Quote:
If you're willing to tell me that you will never exceed a 100 byte range, but not willing to tell the compiler that, you can't blame the lack of optimization valid only for a 100 byte range on the language, the compiler or the CPU. And I am starting to think that your claim that "there is no way to communicate that to the compiler" is not based on a deep understanding of C.
I must also point out that inability for any compiler to detect and output Your Favorite Optimization, hardly makes the language "inefficient," by any reasonable definition of the term. It will always be possible, for any implementation of any language, to find a pessimal case for which a practical compiler will not detect and codegen Your Favorite Optimization. Therefore, the correct long-term solution is to implement codegen on an n-pass compiler such as llvm, where you may do custom lowering explicitly for Your Favorite Optimization, and create unit tests for Your Favorite Optimization, regardless of the complexity of said optimization.
Your Favorite Optimization changes from programmer to programmer. The llvm compiler backend is especially good at enabling arbitrary experiments with custom lowering for any particular target. Personally, I think multiplies by constants, lowered to shifts and adds, will be especially fertile ground for 65xx micro-optimizations. But your codegen example, if you took the trouble to rewrite it as I have, would be straightforward to optimize in the llvm backend.
To be even more specific: I imagine that the llvm 6502 backend will do this by detecting that an 8-bit offset is being added to a 16-bit pointer in zero page, followed by a memory access. See also https://blog.yossarian.net/2020/09/19/L ... by-example and https://llvm.org/docs/GetElementPtr.html .
Re: A Hypothetical C Friendly 6502
Hi Everyone,
Thanks for all the replies to the idea of the 65T2. Sorry it's taken me so long to reply in turn. I looked at the thread the morning (UK time) after I'd posted it; didn't seem to see any replies at the time and it's taken me a little while to get back to answering. And there's far more at the moment than I can hope to address (with zero page or not
).
A bit about my background: I have written a few posts in 6502.org; namely my initial attempt at writing uxForth (Unexpanded) Forth for the VIC-20 and a prior post on a hi-res game of life sim for the unexpanded VIC-20.
I am certainly not a 6502 guru, though I do have an interest in the CPU and computers. I have a number of actual 6502 computers: an Acorn Atom (I wrote AFAIK the first Macintosh Emulator for it in 1998); An Apple ][; and Apple //c; an Oric-1 and Commodore 64. I have a BBC 32Kb and B+ 128Kb (not a BBC Master).
I have an MPhil in computer architecture from Manchester Uni with the Amulet Research Group headed by Steve Furber (over the same period in 1996-1998). In addition I developed the FIGnition DIY, AVR based Forth computer about 9.5 years ago (I sold about 1000 of them). Amulet worked on asynchronous ARM processors.
I apologise if I have missed out important quotes.
Quite possibly. 65T2 predates the OP by a number of years and I just took the subject at face value here. 65T2 is me imagining how the Chuck Peddle team might have designed the 6502 if they'd had another 20 years of knowledge of computer architecture developments. So, to a large degree, the other constraints remain - particularly if they wanted to sell it for $45 in 1976.
Being overly brief is the real problem with my paragraph. I didn't mean to claim too much, and some of it ought to be expressed as a hypothesis. It's certainly true that the early 8-bit MPUs tried to gain credence by relating their processors to minicomputer designs. For example, the 6800 was marketed as being similar to the pdp-11 (presumably because it had multiple addressing modes).
In my analysis, the 6800's architecture is mostly similar to minicomputer architectures such as the pdp-8 or HP2100, which themselves were descended from the TX-0, but also designs like the SSEM (Manchester Baby) were similar.
So, although one can categorise processors as being either single word instruction CPUs or multiple word instruction CPUs; I wouldn't normally class them like that, because e.g. one could compress a single-cell instruction set into a set of multi-cell instructions, but I would think of that as the same architecture, but with a different encoding.
But it is true that there has been a corresponding family of multi-cell computers, for example, the IBM 1401 which used 6-bit character based instructions.
As a whole, though most accumulator CPUs had a wide instruction format (e.g. KDF-9), because I guess it made decoding easy and because large mainframes could afford wide busses (wires were cheaper than components and the parallelism would offset the slow speed of the electronics).
With early 8-bit CPUs there was a major constraint on the number of pins, so that even accumulator based designs would be optimised by having multi-cell instructions, and page 0 optimised the limited bandwidth.
Hence, I think of the design of 6800 to be more related to accumulator based wide instruction set computers than e.g. the pdp-11 or IBM 360 or 1401. In which case the zero-page architecturally does relate to zero page or direct addressing on those kinds of machines.
In addition, it would make sense for the designers to take that approach, because the early MPUs weren't designed by people with experience in computer architecture (see oral history of 4004 and 8008), but people with experience in chip layouts, rubylith and the then pioneering MOS fabrication techniques.
In addition, most of the popular computer languages in the development time period for the 6800, 6502, 8080 and the other also-rans; were languages that were oriented around global variables, such as Fortran and BASIC (about 10 years old when the 6800 was released). In addition, the early 8-bit CPUs were expected to be used in embedded applications rather than fully realised computers.
So, I think I stand by the zero-page observation, even though my explanation was poor, but I do so, also because it's helpful in contrasting it with what might be done if one considers the principle of locality.
Modern CPUs are stack oriented, because the principle of locality means that most of the variable accesses in a given function will be a small subset of the complete address space. Thus, if we have n-functions each of which access on average m-variables, where m is a small number, then what we want is to be able to efficiently access n*m variables and if k and j are two of those functions with local variables a and b, we want to be able to alias variable k[a] in j (e.g. by passing their addresses).
Now, we can employ tricks to achieve this with a CPU like the 6502, but that's not the same as saying the 6502 is designed to support those features, which is what the OP is about.
So, ultimately, observations on zero-page addressing are the means I use to be able to critique it in a 6502 alternative designed for 'C'. And I need to focus on that, because zero-page is one of the big cultural selling points of the 6502, so if I want to replace it, I need to have a good reason for doing so - it would be controversial.
Any modification of the 6502 to accommodate 'C' compilers has to do something equivalent, because you have to make the best use of the instruction bandwidth available: you can either bolt more onto the 6502 design (decreasing it's RISC-ish characteristics) or you can remove things you think aren't essential to make room for more efficient design decisions (from a compiler viewpoint).
And that's what I did: I removed all the zero page addressing modes to provide space for direct and indirect frame-pointer addressing modes. Then I tried to make the instruction set more orthogonal which would help compiler targetting.
PS. The 9-th Stack pointer bit is in the Flags register, though on reflection I don't think it should be restored if S is PULled.
Lunch is over, I'll have to add a bit more later. Thanks for reading this.
Thanks for all the replies to the idea of the 65T2. Sorry it's taken me so long to reply in turn. I looked at the thread the morning (UK time) after I'd posted it; didn't seem to see any replies at the time and it's taken me a little while to get back to answering. And there's far more at the moment than I can hope to address (with zero page or not
A bit about my background: I have written a few posts in 6502.org; namely my initial attempt at writing uxForth (Unexpanded) Forth for the VIC-20 and a prior post on a hi-res game of life sim for the unexpanded VIC-20.
I am certainly not a 6502 guru, though I do have an interest in the CPU and computers. I have a number of actual 6502 computers: an Acorn Atom (I wrote AFAIK the first Macintosh Emulator for it in 1998); An Apple ][; and Apple //c; an Oric-1 and Commodore 64. I have a BBC 32Kb and B+ 128Kb (not a BBC Master).
I have an MPhil in computer architecture from Manchester Uni with the Amulet Research Group headed by Steve Furber (over the same period in 1996-1998). In addition I developed the FIGnition DIY, AVR based Forth computer about 9.5 years ago (I sold about 1000 of them). Amulet worked on asynchronous ARM processors.
I apologise if I have missed out important quotes.
cjs wrote:
(perhaps, moving up to "lower transistor/resource budget than the 6800") might allow for significant architecture improvement worthy of the cost.
Quote:
But I'd like to start with
The 6800 and 6502 are my primary... I don't think you're right here. To be probably overly brief
Quote:
The 6800 supposedly has zero-paged addressing
In my analysis, the 6800's architecture is mostly similar to minicomputer architectures such as the pdp-8 or HP2100, which themselves were descended from the TX-0, but also designs like the SSEM (Manchester Baby) were similar.
So, although one can categorise processors as being either single word instruction CPUs or multiple word instruction CPUs; I wouldn't normally class them like that, because e.g. one could compress a single-cell instruction set into a set of multi-cell instructions, but I would think of that as the same architecture, but with a different encoding.
But it is true that there has been a corresponding family of multi-cell computers, for example, the IBM 1401 which used 6-bit character based instructions.
As a whole, though most accumulator CPUs had a wide instruction format (e.g. KDF-9), because I guess it made decoding easy and because large mainframes could afford wide busses (wires were cheaper than components and the parallelism would offset the slow speed of the electronics).
With early 8-bit CPUs there was a major constraint on the number of pins, so that even accumulator based designs would be optimised by having multi-cell instructions, and page 0 optimised the limited bandwidth.
Hence, I think of the design of 6800 to be more related to accumulator based wide instruction set computers than e.g. the pdp-11 or IBM 360 or 1401. In which case the zero-page architecturally does relate to zero page or direct addressing on those kinds of machines.
In addition, it would make sense for the designers to take that approach, because the early MPUs weren't designed by people with experience in computer architecture (see oral history of 4004 and 8008), but people with experience in chip layouts, rubylith and the then pioneering MOS fabrication techniques.
In addition, most of the popular computer languages in the development time period for the 6800, 6502, 8080 and the other also-rans; were languages that were oriented around global variables, such as Fortran and BASIC (about 10 years old when the 6800 was released). In addition, the early 8-bit CPUs were expected to be used in embedded applications rather than fully realised computers.
So, I think I stand by the zero-page observation, even though my explanation was poor, but I do so, also because it's helpful in contrasting it with what might be done if one considers the principle of locality.
Modern CPUs are stack oriented, because the principle of locality means that most of the variable accesses in a given function will be a small subset of the complete address space. Thus, if we have n-functions each of which access on average m-variables, where m is a small number, then what we want is to be able to efficiently access n*m variables and if k and j are two of those functions with local variables a and b, we want to be able to alias variable k[a] in j (e.g. by passing their addresses).
Now, we can employ tricks to achieve this with a CPU like the 6502, but that's not the same as saying the 6502 is designed to support those features, which is what the OP is about.
So, ultimately, observations on zero-page addressing are the means I use to be able to critique it in a 6502 alternative designed for 'C'. And I need to focus on that, because zero-page is one of the big cultural selling points of the 6502, so if I want to replace it, I need to have a good reason for doing so - it would be controversial.
Any modification of the 6502 to accommodate 'C' compilers has to do something equivalent, because you have to make the best use of the instruction bandwidth available: you can either bolt more onto the 6502 design (decreasing it's RISC-ish characteristics) or you can remove things you think aren't essential to make room for more efficient design decisions (from a compiler viewpoint).
And that's what I did: I removed all the zero page addressing modes to provide space for direct and indirect frame-pointer addressing modes. Then I tried to make the instruction set more orthogonal which would help compiler targetting.
PS. The 9-th Stack pointer bit is in the Flags register, though on reflection I don't think it should be restored if S is PULled.
Lunch is over, I'll have to add a bit more later. Thanks for reading this.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: A Hypothetical C Friendly 6502
Snial, I don't know much about C compilers' innards; but I think you'll want to look into the 816's stack-relative addressing modes and see how you might want to modify or add to them, before eliminating any direct-page instructions. BTW, it allows you, on the fly, to relocate direct page too (which is why it's no longer called 'zero page'), and it doesn't even have to start on a page boundary. One implication is that you can make the direct page overlap the stack area, and have all the DP addressing modes available in the stack area. The stack area is not limited to a single page, since the stack pointer is 16 bits wide.
For example, with the (sr,S),Y addressing mode, you can have the address of an array on the stack, then, all in one instruction, specify how deep the array's address is in the stack (IOW, it doesn't have to be at the top of the stack, nor do you have to pull it off the stack to use it), and then use Y as an 8- or 16-bit offset into that array, to access the desired byte or word in that array, with the array starting in the data bank specified by the data-bank byte. It's triply indexed, quadruply if you count the data-bank register. (In reviewing the topic again after having posted this, it looks like the complaint might be that not all the instructions are available in all the addressing modes. I guess the ones that would be used less could be implemented with the WDM prefix op code.)
(Edited.)
For example, with the (sr,S),Y addressing mode, you can have the address of an array on the stack, then, all in one instruction, specify how deep the array's address is in the stack (IOW, it doesn't have to be at the top of the stack, nor do you have to pull it off the stack to use it), and then use Y as an 8- or 16-bit offset into that array, to access the desired byte or word in that array, with the array starting in the data bank specified by the data-bank byte. It's triply indexed, quadruply if you count the data-bank register. (In reviewing the topic again after having posted this, it looks like the complaint might be that not all the instructions are available in all the addressing modes. I guess the ones that would be used less could be implemented with the WDM prefix op code.)
(Edited.)
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?