6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun May 19, 2024 2:21 am

All times are UTC




Post new topic Reply to topic  [ 57 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sun Sep 04, 2022 8:14 pm 
Offline

Joined: Sat Apr 11, 2020 7:28 pm
Posts: 341
drogon wrote:
You're probably right there... I can't think of any (65xx) systems by different manufacturers that might in any way be considered compatible. Even in the same manufacturer... Apple // and Apple /// ? Acorn Atom, BBC Micro? Different makers had different ideas and even in the same company ideas differed (Sinclair ZX80, 81 and Spectrum) - the only common thing is that you need a little bit of ROM at the top (or a way to get code/vectors there) and a little bit of RAM at the bottom.


You are right. There is no standard system in the 65xx world.

The concept discussed here is related to the resources that would be available to the assembler programmer of any system, though. Again: what about having two or more accumulators in the CPU instead of one?



drogon wrote:
See: https://muse.jhu.edu/article/235250

... and when looking for that, my initial search brought this up:

https://dl.acm.org/doi/pdf/10.1145/53990.54000

which might be worth a read.-G


Thanks for those interesting links


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 04, 2022 8:21 pm 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 861
tokafondo wrote:
Well... thanks all for your comments.

It seems to me that the 65xx ecosystem is strongly influenced by the hardware it runs on.


There is a thread here discussing desired extensions to an eighties era 6502.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 04, 2022 8:35 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1413
Location: Scotland
tokafondo wrote:
The concept discussed here is related to the resources that would be available to the assembler programmer of any system, though. Again: what about having two or more accumulators in the CPU instead of one?


Doesn't the 6809 have 2 accumulators? The '816 allows you to swap 2 x 8-bit accumulators...

But other than that, what you're written is very much like the first sentence in many RISC ISAs...

The RISC systems I've used (Sparc, i860, RISC-V) don't have accumulators as such, but have a bank of registers that all work the same - you can add any register to any register and put the result in any register, or use any register as a pointer to memory to load/store any other register (sometimes optionally adding on another register as an indirection in the process), and so on.

Some might suggest that after their 6502 and 65816 computers, Acorn developed the ARM as a successor to those CPUs... 32 x 32bit registers and while the amount of ARM code I've written is less than a folded page of A4 it seems to be fairly orthogonal...

But once you have many registers it's then up to a combination of you, the user to use those registers but beware mixing use if you pull in some pre-compiled libraries that expect certain registers to behave in various ways (e.g. arguments to a called function, and so on)

I wrote an application recently in RISC-V assembler completely outside the suggested register use (for a compiler) and managed to fit the entire state of my system inside all those 31 registers - including an interrupt routine.... So it's possible to invent your own scheme if needed - you just have to write the whole system yourself. (My thing is a bytecode interpreter and it's blindingly fast on that architecture compared to the '816 I ported it from)

Maybe in the 6502 world look at Acheron?

https://github.com/AcheronVM/acheronvm

It's essentially macros that provide a 16-bit VM environment with multiple registers without the overhead of (say) a traditional bytecode VM...

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 04, 2022 8:45 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
FWIW, some folks believe the 6502 was originally planned to have two accumulators, and the opcode map does perhap lend plausiblility to the theory. If anyone's interested, there's more about that in this thread.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 12:56 am 
Offline

Joined: Mon Feb 15, 2021 2:11 am
Posts: 100
Dr Jefyll wrote:
FWIW, some folks believe the 6502 was originally planned to have two accumulators, and the opcode map does perhap lend plausiblility to the theory. If anyone's interested, there's more about that in this thread.

-- Jeff


Considering most of the 6502 team had been involved in the two accumulator 6800, and the observations in that thread you linked to, it seems likely enough. A hypothetical two accumulator 6502 would have been a bit like an 8 bit version of the early 16-bit Data General Nova systems, which had two accumulators and two index registers (though no stack pointer). Something resembling the Data General Nova architecture wouldn't have been an odd choice to design towards - the microprogrammed Xerox Alto emulated its instruction set, at least in early releases.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 6:12 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8440
Location: Southern California
I've been watching this topic with interest, kind of watching for where it might go. It does mostly sound like a HLL; but even there, universality is absent in many HLLs. Even early BASICs had pretty significant differences between them, and as for Forth, like they say, "If you've seen one Forth, well, you've seen one Forth"—although there are good reasons for it, in spite of the lack of portability; in fact, ANS Forth in '94 was an effort to come up a newer and more-standard standard than the mishmash of earlier standards, and some parts of ANS present extra overhead in order to make it more portable across a wide range of processors, which is why I have not adopted it (or any later ones).

You've mentioned two accumulators a few times now. Do you have a specific use envisioned for them? I'm not sure I've ever wished to two accumulators, but I wouldn't mind having another register that duplicates the functions of X. The '816 allows 16-bit index registers (and allows a 16-bit accumulator too, independently), so at least indexing past 255 is possible there, and adds the stack-relative addressing modes. Its op-code table is full; so adding more registers and the op codes to use them would require a wider instruction word, or more operand bytes, or two-byte op codes and more complex (and probably also slower, as BigEd said) internal instruction decoding.

I'm not familiar with a wide inventory of processors, but it is my understanding that a major push for lots of registers was partly to make it easier to write compilers. However, just having more registers, even wider ones, does not guarantee performance, if I may point to the example of the RCA 1802 which had 16 16-bit registers and yet performed very poorly compared to the 6502, or the 32016 about which Sophie Wilson, chief architect of the ARM processor, said, "an 8MHz 32016 was completely trounced in performance terms by a 4MHz 6502." (The 32016 was National's 32-bit processor, having 15 registers, including 8 general-purpose 32-bit registers, and a 16-bit external data bus.) The 65816 even outperformed the 68000 and 8086 in the Sieve of Eratosthenes benchmark.

Even 40 years ago, a Z80 had to run at 3 or 4MHz to keep up with a 1MHz 6502; and Jack Crenshaw, an embedded-systems engineer who wrote regularly in Embedded Systems Programming magazine said in the 9/98 issue that he still couldn't figure out why, in BASIC benchmark after benchmark, the 6502 could outperform the Z80 which had more and bigger registers, a seemingly more powerful instruction set, and ran at higher clock rates. (The 6502's zero page and improved indexed and indirect addressing modes no doubt helped.)

Here on the forum, sark02 said, "My next computer was an Atari 800XL. [...] Coming from the Z80 I was initially dumbfounded by the criminal lack of registers, but BY GOD was it FAST! I spent hundreds of hours and wrote 1000s of lines of 6502 code over the following couple of years, diving deeper into the 800XL hardware and capabilities, and really came to enjoy the 6502." (See also this post.)

So my point again is that having more registers, even wider ones, is not necessarily helpful by itself. Other factors have to enter the picture.

As for extending the processor with external logic, there's Jeff Laughton's KimKlone 65c02 with pointer-arithmetic-friendly extended address space and 9-cycle ITC Forth NEXT. It gives 6 new registers and 44 new instructions.

Memory-speed bottlenecks would be another reason for the push for lots of onboard registers, and I, too, have contemplated having ZP and the page-1 stack onboard so as not to have to go off-chip for these, and in fact if they had their own buses internally, there could be more instruction overlap where for example a store or a push could happen while the next instruction is being fetched. OTOH, certain access techniques might be forfeited unless again the op-code set were expanded, again requiring two-byte op codes.

Quote:
You can do things with A that you can't with X and Y. At least, directly.

Self-modifying code can get you some of those things, like
  • add X to A
  • LDA (A), LDA (X), or LDA (Y)
  • LDA (abs),X, LDA (abs),Y, LDX (abs),Y, and LDY (abs),X
  • JMP ((abs)) (ie, doubly indirect)
  • Save and restore a register one cycle faster than pushing and pulling, without using the stack, or even save and restore the stack pointer without using the stack or any variables
  • LDA table,X,Y equivalent, ie, double-indexed; or even LDA (X+Y) !
and plenty more. See my article at http://wilsonminesco.com/SelfModCode/ .

I'm still kind of straining to understand the HAL's definition and application though.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 7:59 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1413
Location: Scotland
It's easy to do speed compares, but the reasons are fairly well understood - as I understand it, anyway -in that it boils down to clock usage efficiency - the 6502 can process a one byte instruction in 2 cycles (and the 2nd cycle is then reading the next instruction) where the Z80 needs more - a minimum of 4 and often many more.... then it starts over again, ah, new clock cycle, lets read the next instruction in rather than the one cycle overlap the 6502 can achieve. I think this is due to the effective use of both edges of the clock in the 6502 and what is effectively a single stage pipeline. I think early versions of other (wider) CPUs were similar - operations spread out over many clock cycles - possibly to try to keep things simple for early silicon technology and possibly even to keep things understandable for the people designing it as the chips got larger and larger.. RAM was still expensive, so early 68K systems could still work on an 8-bit wide memory system which doubles the clocks needed just to read in a 16-bit word, or double again for a 32-bit word. It really wasn't long before the 68K was featured in high-end Unix workstations and so on, Apollo, Sun and so on. (The early 80's was also a time of fierce competition - if your CPU didn't perform it wouldn't be bought and people weren't interested in waiting for the next revision)

RISC changed that - maybe not as much as expected, but the much touted one instruction per cycle was the goal - and it worked - until the CPU core got so much faster than the memory systems - a problem we still have today, and we see it even in our 6502 systems where we only have just under half a clock cycle to read or write RAM. RISC initially seemed slower as there were effectively less instructions so you had to write more lines of code to do the same thing but wider memory and caches kept the balance - at least for a while and you were dealing with wider data from the start. The advantage RISC (notably ARM) had for a very long time, and even today is low power - mostly due to a relatively low gate count, although when you start to add in vector capable FPUs and GPUs that starts to fall by the wayside.

I was part of "the 6502 is better than the Z80" brigade way back in the late 70's and early 80's, but it was relatively easy to fall into that slot - even though the Z80 CP/M systems I was using did seem much more capable - or at least they were good enough - for easier access to high level languages (Editors, Wordstar, Pascal and C) Possibly the difference between something looking more "operating system" like (after exposure to various mini computer OS's, Unix, etc.) so ultimate speed then wasn't really an issue, but applications were - the home computer users cared more for their games, etc. Or so it seemed, so ultimate speed and coding tricks ruled the day.

-G

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 9:31 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
I'd put it slightly differently, but perhaps to similar effect: the 6800 and 6502 are unusual in having one machine cycle for each memory cycle. I think the 8-flavours from Intel and Zilog, and Moto's 68k, and National's 32k, all are designed to have several machine cycles per memory cycle. It's yet another set of engineering tradeoffs, which can only be understood by considering both choices.

Later micros with caches start to shift the tradeoffs.

I do wonder if people who start by learning to program a register-rich machine have more trouble adjusting to the 6502 type of machine, compared to those who start at the 6502 end of the spectrum. In much the same way, perhaps, as people who start by understanding C's use of the stack for parameters and local variable can struggle to understand the different idioms used in 6502 land.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 9:52 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1413
Location: Scotland
BigEd wrote:
I'd put it slightly differently, but perhaps to similar effect: the 6800 and 6502 are unusual in having one machine cycle for each memory cycle. I think the 8-flavours from Intel and Zilog, and Moto's 68k, and National's 32k, all are designed to have several machine cycles per memory cycle. It's yet another set of engineering tradeoffs, which can only be understood by considering both choices.

Later micros with caches start to shift the tradeoffs.

I do wonder if people who start by learning to program a register-rich machine have more trouble adjusting to the 6502 type of machine, compared to those who start at the 6502 end of the spectrum. In much the same way, perhaps, as people who start by understanding C's use of the stack for parameters and local variable can struggle to understand the different idioms used in 6502 land.


Very good points.

In the assembly language world, I started on 6502, then Prime (horrible "mini" computer thing) then 8080. I took to the 8080 with relative ease and performance wasn't an issue (it was a real-time blood analysis machine where the process times were counted in 10's of milliseconds), but after that after a very brief dabble with ARM it was Transputer which has quite a unique architecture, then back into the RISC world with Sparc with it's plethora of registers and register windowing system and i860 with it's dual-instruction mode braincell destroying nightmare. I gave up on assembler after that and vowed to never look back since C was good enough, however that didn't quite work and I drifted back oddly enough into the PDP-8 world which didn't feel at all bad now that I recall...

In recent times I've been looking at RISC-V - and I recently ported my BCPL Cintcode VM from the '816 to it - the Cintcode registers which were in zero/direct page on the '816 mapped directly to real registers in the RV CPU and I was able to use other registers to good effect and essentially everything that was in Zero/Direct page in now in registers - so there is a parallel there between using pseudo 16/32 bit pointers/registers on the '816 to using real 32-bit registers in the RV.

But, as Ed says, engineering tradeoffs and challenges - both in the hardware and grey-cell worlds. Tools were evolving quickly in the late 70's and early 80's so it was easier to get the computer to do more work to improve layout, help save a cycle here and there, and so on.

Also markets - the 68K did do well - in commercial worlds - I think the N32K might have worked too, but they seemed to be plagued by initial production issues and bugs before it ended up in the "too little, too late" category. Or maybe it was just the home/hobby world that made most noise back then? It's just too easy to complain about speeds and discount other features - such as potentially faster arithmetic, bigger memory potentials and so on....

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 2:04 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
GARTHWILSON wrote:
Even 40 years ago, a Z80 had to run at 3 or 4MHz to keep up with a 1MHz 6502
This is factually true but easy to misinterpret. In an apples-to-apples comparison you'd begin by establishing a level playing field with respect to memory speed (which dominates system cost). And the faster-clocked Z80 is quite happy with the same, slow-ish memory used by the 6502.

As Ed said, the 6800 and 6502 are unusual in having one machine cycle for each memory cycle. And IMO the Z80 has certain shortcomings, but using more machine cycles per memory cycle simply doesn't matter; you just drop in a different crystal. :!:


GARTHWILSON wrote:
As for extending the processor with external logic, there's Jeff Laughton's KimKlone 65c02 with pointer-arithmetic-friendly extended address space and 9-cycle ITC Forth NEXT. It gives 6 new registers and 44 new instructions.
Thanks for the plug! I hope it's clear that this is not a proposal but a real machine, built (and put to work) in 1988.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 6:57 pm 
Offline

Joined: Mon Feb 15, 2021 2:11 am
Posts: 100
GARTHWILSON wrote:

I'm not familiar with a wide inventory of processors, but it is my understanding that a major push for lots of registers was partly to make it easier to write compilers.


My understanding is that having a large number of registers simplifies the writing of efficient register allocation/register spilling algorithms. Those algorithms are responsible for generating the machine code for moving variable values between registers and memory, and efficient versions of these algorithms make as few such moves as possible. This became far more important once the speed of memory could no longer keep up with processor speed.

GARTHWILSON wrote:

However, just having more registers, even wider ones, does not guarantee performance, if I may point to the example of the RCA 1802 which had 16 16-bit registers and yet performed very poorly compared to the 6502,


This is very true. The RCA 1802 architecture had some severe limitations, particularly with regards to addressing. As I understand it, accessing memory was largely through register-indirect addressing. Memory addresses needed to be loaded into the registers R0-RF in order to access memory, which often involved a series of load immediate operations. Once the address was in a register, arithmetic and logical operations could reference the value at that address, or the value could be loaded into another R register, or into the accumulator (register D). There were R register increment and decrement operations (also in conjunction with certain load/store ops), but the arithmetic and logical operations all using involved the accumulator.

GARTHWILSON wrote:

Even 40 years ago, a Z80 had to run at 3 or 4MHz to keep up with a 1MHz 6502; and Jack Crenshaw, an embedded-systems engineer who wrote regularly in Embedded Systems Programming magazine said in the 9/98 issue that he still couldn't figure out why, in BASIC benchmark after benchmark, the 6502 could outperform the Z80 which had more and bigger registers, a seemingly more powerful instruction set, and ran at higher clock rates. (The 6502's zero page and improved indexed and indirect addressing modes no doubt helped.)


I believe the Z80 actually had a four bit ALU, as I've read Federico Faggin was trying avoid some 8-bit ALU-related patent Intel had rights to based on his earlier 8080 work. That probably played a part - at least one extra cycle for each ALU operation.


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 05, 2022 8:00 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
(The 1802 has a serial ALU - it takes 8 clock cycles for a machine cycle, and an instruction takes two or three machine cycles. A memory access takes just one machine cycle - but that's 8 clock cycles. As Jeff notes, we should generally scale our estimation of speed according to memory access times.)


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 06, 2022 2:19 am 
Offline

Joined: Mon Feb 15, 2021 2:11 am
Posts: 100
BigEd wrote:
(The 1802 has a serial ALU - it takes 8 clock cycles for a machine cycle, and an instruction takes two or three machine cycles. A memory access takes just one machine cycle - but that's 8 clock cycles. As Jeff notes, we should generally scale our estimation of speed according to memory access times.)


Ouch. Think of a short pair of instructions in the 6502, which would take about 8 (machine and clock( cycles:
LDA abs
ADC abs

Memory addressing on the 1802 is limited to register indirect, and load immediate is limited to loading into the accumulator. Just setting the address in a register (R0, in my example below) would take about four instructions, at about two machine cycles each, so 64 clock cycles just to load the address into a register. That's not even halfway through the work of the 6502 example above, and we haven't even loaded a value from memory yet, nor done any arithmetic, just stuck the address into a register.

LDI abs_lo
PLO 0
LDI abs_hi
PHI 0

At a given clock speed, it wouldn't surprise me if the 6502 wasn't getting ten or twenty times as much done.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 06, 2022 4:15 am 
Offline
User avatar

Joined: Fri Aug 26, 2022 9:17 am
Posts: 12
Location: Manila, Philippines
According to CPU Shack:
Quote:
Back when the 6502 was introduced, RAM was actually faster than microprocessors, so it made sense to optimize for RAM access rather than increase the number of registers on a chip.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 06, 2022 4:08 pm 
Offline

Joined: Sat Apr 11, 2020 7:28 pm
Posts: 341
I'll drop this here:
Code:
AVR Instruction Set Nomenclature
=================================

Status Register (SREG)

SREG   Status Register
C      Carry Flag
Z      Zero Flag
N      Negative Flag
V      Two’s complement overflow indicator
S      N ⊕ V, for signed tests
H      Half Carry Flag
T      Transfer bit used by BLD and BST instructions
I      Global Interrupt Enable/Disable Flag

Registers and Operands

Rd:    Destination (and source) register in the Register File
Rr:    Source register in the Register File
R:     Result after instruction is executed
K:     Constant data
k:     Constant address
b:     Bit in the Register File or I/O Register (3-bit)
s:     Bit in the Status Register (3-bit)
X,Y,Z: Indirect Address Register (X=R27:R26, Y=R29:R28, and Z=R31:R30)
A:     I/O location address
q:     Displacement for direct addressing (6-bit)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 57 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: