-- Jeff
Instructions that I missed
Re: Instructions that I missed
BigDumbDinosaur wrote:
fachat wrote:
SWP - swap upper and lower nibble
Alienthe wrote:
So why was SWEET16 never implemented in hardware? Like the 6502 it was elegant, orthogonal and required no prefix codes. It seems to me like a lost opportunity.
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: Instructions that I missed
Another thing that was done on some 6502 variant, and which I always liked as an extension, was the addition of a Z index register whose default value was 0. Many of the new opcodes corresponded with existing ones, so STZ did exactly that, and the zp-indirect instructions like LDA (zp) became LDA (zp),Z. Always seemed like a smart way of getting some extra mileage out of a limited number of opcodes, while retaining backward compatibility.
ASR - yes, that would've been nice. Also ADD and SUB (without taking C as input) would've been great, and while we're at it (taking inspiration from the ARM), RSB and RSC for reverse subtract [with carry] would've saved the whole horrible sequence of:or, slightly more cleverly:RSB #0 would of course provide a nice 2 cycle 2's complement operation.
ASR - yes, that would've been nice. Also ADD and SUB (without taking C as input) would've been great, and while we're at it (taking inspiration from the ARM), RSB and RSC for reverse subtract [with carry] would've saved the whole horrible sequence of:
Code: Select all
STA temp
LDA #$08
SEC
SBC tempCode: Select all
EOR #$FF
SEC
ADC #$08Re: Instructions that I missed
RichTW wrote:
RSB and RSC for reverse subtract [with carry] would've saved the whole horrible sequence of:or, slightly more cleverly:RSB #0 would of course provide a nice 2 cycle 2's complement operation.
Code: Select all
STA temp
LDA #$08
SEC
SBC tempCode: Select all
EOR #$FF
SEC
ADC #$08Code: Select all
INV A
CLC
ADC #value
André
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/
Re: Instructions that I missed
teamtempest wrote:
Quote:
BLT - Branch Less Than (C=0 or Z=1)
BGT - Branch Greater Than (C=1 and Z=0)
BGT - Branch Greater Than (C=1 and Z=0)
Forgive my ignorance, but isn't signed the same as unsigned, as long as V is NOT set?
Quote:
Quote:
SWP - swap upper and lower nibble
Quote:
Quote:
INV - two's complement
Code: Select all
EOR #$FF
SEC
ADC value
Code: Select all
INV A
CLC
ADC value
Quote:
Quote:
BCN - bit count, compute number of 1-bits
Quote:
Quote:
PSH - push all registers (in a 6502 that would be AC, XR, YR, more in a 65002)
PLL - pull all registers (in a 6502 that would be AC, XR, YR, more in a 65002)
PLL - pull all registers (in a 6502 that would be AC, XR, YR, more in a 65002)
But an instruction to do it would be just as useful. The idea of multiple dedicated stacks would of course be to minimize response time to an interrupt. I don't think these would be as useful for calling subroutines, mainly because of the difficulty of returning results in registers (if the callee did PSH at the start then PLL at the end would trash anything you put in them...unless maybe the caller did PSH? Then after the callee returns, save any results passed in registers before the caller does PLL...might work. Okay, I like it)
I though doing these opcodes to make model/family-independent interrupt routines. A later 650x0 could have more registers, and this PLL/PSH would simply be extended to pull/push them from/onto the stack.
This would not necessarily be used for calling subroutines - here I would use the registers for parameters and return values anyway, and selectively save those on the stack that are needed.
Quote:
Quote:
FIL - fill a memory area with a byte value
Quote:
Quote:
ADS - add value to stack pointer
SBS - substract value from stack pointer
SBS - substract value from stack pointer
Quote:
Quote:
MVN/MVP - move a memory area (see 65816)
Anyway, the plan is to have two cycles per byte maximum (maybe less with wider memory interfaces)
Quote:
Quote:
HBS - determine bit number of highest bit that is set (like log2)
HBC - determine bit number of highest bit that is clear
BSW - bit swap - exchange bit 7 with bit 0, bit 6 with bit 1, etc
HBC - determine bit number of highest bit that is clear
BSW - bit swap - exchange bit 7 with bit 0, bit 6 with bit 1, etc
BSW I would find useful for computing the swapped addresses in a fast fourier transform (although then it would probably better be used on an index register).
Those would either be read memory -> do operation -> store in AC, or AC -> operation -> AC. Don't think a R/M/W would be useful here though.
If no bit is set/cleared on HBS/HBC respectively, the carry bit could be set for example. Haven't thought about that yet I have to admit.
André
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/
Re: Instructions that I missed
GARTHWILSON wrote:
STF STore FF to a memory location without affecting processor registers. Many times a byte is used as a flag variable, and STZ clears it, so STF would set it much more efficiently than STZ, DEC. LDA #$FF (or $FFFF) followed by STA ___ affects a processor register.
Quote:
DIN and DDE Double INcrement and Double DEcrement. Same as INC INC or DEC DEC but faster, and the C flag would tell if you went from FF to 01 or vice-versa without having to test in between the two INC or DEC instructions. This is particularly useful in higher-level languages that are always incrementing pointers to the next two-byte address.
Code: Select all
INY #2
DEX #4
Quote:
BEV & BOD ("Branch if EVen" and "Branch if ODd"), using another flag in the status register. I can't remember anymore why I wanted these. The need might go away with the DIN and DDE above. Since the NMOS 6502 had the JMP(xxFF) bug, Forth on that one required keeping 16-bit values aligned on even addresses. It's not a problem on the CMOS one, so potentially thousands of bytes can be saved by not having to align.
Quote:
Eliminate most dead bus cycles. I don't know if it gets easier if the input clock were faster than the bus, like 20 or 40MHz input for a 10MHz bus.
Quote:
Ones like MULtiply take a lot of silicon real estate.
André
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/
Re: Instructions that I missed
A couple of points:
On FGPA, multiply is cheap. The multipliers are already there, whether you use them or not. The argument that they use up a lot of silicon area applies to the era of the 6502 rather more strongly than to the era of the 250k gate FPGA. (But division remains difficult.)
If you supply bitcount, parity is cheap. The converse isn't true! Whether bitcount is worth bothering with is another question, but again, on FPGA we don't pay for silicon (within reason), we pay for lines of code and we pay if clock speed is affected.
Final point: beware of wish-list features which trim one or two bytes or clock cycles. These thoughts probably come from experience with 1MHz or 2MHz systems with 64k memory. If a project is delivering a CPU which runs at 25MHz with a 32bit address space, the baseline performance should already be a lot higher and the memory pressure a lot lower. Time might be better spent on increasing clock speed, or making a smarter SDRAM interface which will speed up all programs, instead of adding or tweaking a few special-case instructions. (Of course, some instructions are worth the effort, and others are not.)
Cheers
Ed
On FGPA, multiply is cheap. The multipliers are already there, whether you use them or not. The argument that they use up a lot of silicon area applies to the era of the 6502 rather more strongly than to the era of the 250k gate FPGA. (But division remains difficult.)
If you supply bitcount, parity is cheap. The converse isn't true! Whether bitcount is worth bothering with is another question, but again, on FPGA we don't pay for silicon (within reason), we pay for lines of code and we pay if clock speed is affected.
Final point: beware of wish-list features which trim one or two bytes or clock cycles. These thoughts probably come from experience with 1MHz or 2MHz systems with 64k memory. If a project is delivering a CPU which runs at 25MHz with a 32bit address space, the baseline performance should already be a lot higher and the memory pressure a lot lower. Time might be better spent on increasing clock speed, or making a smarter SDRAM interface which will speed up all programs, instead of adding or tweaking a few special-case instructions. (Of course, some instructions are worth the effort, and others are not.)
Cheers
Ed
Re: Instructions that I missed
I wrote:
Also ADD and SUB (without taking C as input) would've been great
Re: Instructions that I missed
RichTW wrote:
I wrote:
Also ADD and SUB (without taking C as input) would've been great
Re: Instructions that I missed
That's what I figured, but the original 6502 had plenty of spare opcode space. Why were ASL/ROL and LSR/ROR afforded the luxury of having versions which ignored the carry or used the carry, while ADC and SBC were not. Arguably, addition and subtraction are a more basic and common operation than multiplication/division by 2.
Re: Instructions that I missed
RichTW wrote:
That's what I figured, but the original 6502 had plenty of spare opcode space. Why were ASL/ROL and LSR/ROR afforded the luxury of having versions which ignored the carry or used the carry, while ADC and SBC were not. Arguably, addition and subtraction are a more basic and common operation than multiplication/division by 2.
Re: Instructions that I missed
Arlet wrote:
I would take almost no logic, but it would take a large chunk of opcode space, which may be better used for something else.
Code: Select all
RDR (E)
André
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/
Re: Instructions that I missed
GARTHWILSON wrote:
Quote:
1: Loops never get tight enough.
2: The BNE is (nearly) symmetric as to the maximum distance of branching forward or backward and it gets inelegant if you have to branch more than 128 away from present address. An instruction as proposed here would lend itself to a backward bias, with branches -192 to +64, perhaps even -224 to +32, saving ugly branches plus JMP.
2: The BNE is (nearly) symmetric as to the maximum distance of branching forward or backward and it gets inelegant if you have to branch more than 128 away from present address. An instruction as proposed here would lend itself to a backward bias, with branches -192 to +64, perhaps even -224 to +32, saving ugly branches plus JMP.
Quote:
The benchmarks truly are favorable to the 6502; but the one that really counts of course is your own application. I have a short list of new instructions I would like added, but I can imagine a list of reasons why they have not been added, ranging from instruction-decoding complexity versus speed, to silicon real-estate costs which apparently are one of the major motivators for WDC's licensees to choose the 65c02 over the competition for many jobs.
Is anyone from WDC on this forum? It would be interesting to hear their views on what this discussion is bringing up. Implementing a LPX instruction should be trivial to them.
Re: Instructions that I missed
teamtempest wrote:
Quote:
LPX (=DEX; BNE) might even be combined as LPX ++Y (=INY; DEX; BNE) allowing fast reversal. Simultaneous decrementing both X and Y seems less useful.
Quote:
Presumably the answer is the change in X. Okay, but what am I going to know about the state of Y (ie., the value it holds) after the loop terminates? In the general case, presumably nothing.
Quote:
My second reaction is: oh, this adds to the complexity of trying to visualize just what the instruction really does. Every other instruction changes just one register at a time; this would be un-orthogonal to them. Oh, my poor head!
I do pity myself quite easily.
I do pity myself quite easily.
Re: Instructions that I missed
Dr Jefyll wrote:
Alienthe wrote:
So why was SWEET16 never implemented in hardware? Like the 6502 it was elegant, orthogonal and required no prefix codes. It seems to me like a lost opportunity.
I would propose some minor tweaks, say SWEET17 (she grew up, right?), by synchronising the A, X and Y registers with SWEET16 registers on entry and exit:
Code: Select all
R0 low byte: A
R0 high byte: zero on entry
R1: B register - concatenated with A when switching to SWEET32 mode (she grew up to a lady)
R4 low byte: X
R4 high byte: zero on entry
R6 low byte: Y
R6 high byte: zero on entry- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Instructions that I missed
Alienthe wrote:
I remember compiler authors also back then said their compilers created better code than hand crafted assembly. My employers disagreed and I was brought in on many a project to squeeze that extra bit of oumph out of the code or shave off memory or bandwidth requirements. The 6502, in my view, was very appealing for coding and lent itself to ultra high density code.
Yes, for 6502, I would find it hard to believe that a human couldn't do a lot better than a compiler. What the compiler authors were saying might be true of other processors whose assembly languages are nearly too complex for most programmers to do well with in assembly language.
Quote:
Is anyone from WDC on this forum? It would be interesting to hear their views on what this discussion is bringing up. Implementing a LPX instruction should be trivial to them.
There is one, but I don't think I've ever seen him post.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?