6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 12:51 pm

All times are UTC




Post new topic Reply to topic  [ 127 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 9  Next
Author Message
PostPosted: Sun May 03, 2020 5:16 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I usually try not to project my own preferences too much in threads such as these: there are many possible ways to go and a lot of it comes down to preferences and adopted constraints.

I do have a preference against mode bits, and I don't have any problem with a lack of carry-free arithmetic operations, but you should make the tradeoffs which seem right to you.

I think it can be useful always to reflect that there will be alternate code sequences for anything that's missing: things should be added if they are worth adding, in terms of time to implement, to document, to explain, to test, versus runtime speed and code density.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 03, 2020 6:30 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Proxy wrote:
so overall i wouldn't really know what to put in the prefix-ed opcodes...
One very tidy exploit is to let the prefixed opcode do the same thing as the non-prefixed opcode.. the only difference being the register which is acted upon. Z80 is a good example, IIRC.

Lots of the old 8080 opcodes acted on register HL. On Z80, those very same opcodes have the same function -- ADD, or whatever -- but when a prefix is present the chip does the ADD using register IX (for example) instead of HL. The logic for this kind of thing is a lot simpler to implement, as compared to having the prefix take you into an utterly different set of instructions. IOW, it's a nice, simple way to add new instructions without clogging up the primary opcode map. :)

For example, you could have a prefix that substitutes Accumulator B for the Accumulator A.

As for determining parity, this is an example of something with a perhaps in-obvious alternative solution, namely a 256-byte lookup table. Another example mentioned somewhere recently is an instruction to complement the Accumulator.... but folks can manage quite nicely with just EOR #$FF instead.

Every prospective feature needs to be examined in this light, I'd say. Of all the things that threaten your success, complexity probably tops the list. :|

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun May 03, 2020 8:45 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
Dr Jefyll wrote:
One very tidy exploit is to let the prefixed opcode do the same thing as the non-prefixed opcode.. the only difference being the register which is acted upon. Z80 is a good example, IIRC.

Lots of the old 8080 opcodes acted on register HL. On Z80, those very same opcodes have the same function -- ADD, or whatever -- but when a prefix is present the chip does the ADD using register IX (for example) instead of HL. The logic for this kind of thing is a lot simpler to implement, as compared to having the prefix open an utterly different set of instructions. IOW, it's a nice, simple way to exploit the power of prefixes! :)

thing is, my decoding/execution logic is literally just a ROM i program every instruction into manually.
so it doesn't matter where i put Opcodes on the table, the complexity of the circuit will always stay the same. it's a blessing and a curse.
though i would still order opcodes just for the sake of order.

Dr Jefyll wrote:
Edit: for example, you could have a prefix that substitutes Accumulator B for the Accumulator A.

I see what you mean, and i can think of some examples
TAX, TXA, TAY, TYA with a prefix could become TBX, TXB, TBY, TYB.
or all branch instructions with a prefix have 16 bit relative addresses instead of 8 bit ones.
this wouldn't even need new mnemonics as the assembler could just automatically select the 16 bit one if the target address is out of range for an 8 bit one.

only problem is that all of those prefix instructions are going to be slightly slower and larger than their normal counterpart. so i'm at the same problem again, i need to decide what instructions to put into it that would be worth it.
for example, the long relative branches for example would be 4 byte long instructions (1x Prefix, 1x Opcode, 1x 16-bit Operand) and would be similar in function to a Branch followed by an Absolute Jump, which in total would be 5 bytes... so it seems worth it to add the long relative branches as even with a prefix they are faster/more compact than the regular 6502 alternative.

Dr Jefyll wrote:
As for determining parity, this is an example of something with a tidy but perhaps in-obvious alternative solution, namely a 256-byte lookup table. Another example mentioned somewhere recently is an instruction to complement the Accumulator.... but folks can manage quite nicely with just EOR #$FF instead. Every prospective feature needs to be examined in this light, I'd say. Of all the things that threaten your success, complexity probably tops the list. :|


I'm not 100% sure i understand what you mean...
are you saying that i should look at every instruction i want to add make sure that it's actually efficient/worth adding by comparing how people do the same thing without the instruction?
basically what i did in this and my last post with the ADD/ADC example, Transfer comparisons, and the Long Relative branches?
because i can and probably will do that for most/all instructions.

though overall i'm not sure a prefix system is even the right choice. i can pretty much fit all current instructions into the existing Opcode table.
and i think a prefix system would fit more into a future project, a 16 bit version of this maybe. (if i ever get around to doing that and fiquiring out how to make everything align with words instead of bytes)


Last edited by Proxy on Sun May 03, 2020 9:43 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun May 03, 2020 9:43 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Correct - see what is actually gained by adding the instruction, versus working around its absence in the most efficient way. You could add assembler pseudo-instructions (by macro) to make the CPU syntactically look like you've added new instructions, without actually using up opcode space and microcode programming time. Also think about how often the programmer will actually need the operations you add.

So you can write ADD x, and the assembler will actually emit CLC : ADC x. It makes CLC into a prefix byte that disables the carry-in of ADC. If you make CLC into a 1-cycle instruction (it's 2 cycles on a normal 6502), then it's a pretty efficient mechanism. Similarly, you can write NOT A, and the assembler will emit EOR #$FF.

Where I would add real prefix bytes is to govern the extension of 8-bit registers and operations to 16-bit, and 16-bit addresses to 24-bit. Those are operations that you'd expect to take more time and space anyway, so a prefix byte that adds slightly to those properties is acceptable as a means of controlling the combinatorial explosion that otherwise results. The mode bits that the '816 introduced are a more awkward way of doing the same thing, as noted above. An advantage to this approach is that you don't lose addressing modes when you move up to the wider address space, purely due to lack of opcode space. And yes, being able to do ALU ops on the index registers as well as the accumulator could sometimes be very handy; a prefix byte could enable that while still keeping the accumulator primary for arithmetic.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 03, 2020 9:50 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Proxy wrote:
for example, the long relative branches for example would be 4 byte long instructions (1x Prefix, 1x Opcode, 1x 16-bit Operand) and would be similar in function to a Branch followed by an Absolute Jump, which in total would be 5 bytes... so it seems worth it to add the long relative branches as even with a prefix they are faster/more compact than the regular 6502 alternative.


Ah, and they would open the door to easily writing position-independent code...

I strongly recommend looking at the 6809 instruction set. A prefix of $10 converts short relative branches into long relative branches.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 03, 2020 10:05 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
Chromatix wrote:
Correct - see what is actually gained by adding the instruction, versus working around its absence in the most efficient way. You could add assembler pseudo-instructions (by macro) to make the CPU syntactically look like you've added new instructions, without actually using up opcode space and microcode programming time. Also think about how often the programmer will actually need the operations you add.

So you can write ADD x, and the assembler will actually emit CLC : ADC x. It makes CLC into a prefix byte that disables the carry-in of ADC. If you make CLC into a 1-cycle instruction (it's 2 cycles on a normal 6502), then it's a pretty efficient mechanism. Similarly, you can write NOT A, and the assembler will emit EOR #$FF.


welp, time to rethink all 12 Transfer instructions i got!

honestly what i find difficult as well is choosing the correct addressing modes for an instruction.
for example the MUL and DIV instructions, what addressing modes are likely to be used for these?
at the bare minimum i'd say Immediate, Absolute, and Zeropage, but beyond that i'm not sure... maybe an indexed addressing mode as well...?

and if you choose too many modes for some instruction it can easily fill up the opcode table with unnecessary clutter.

Chromatix wrote:
Where I would add real prefix bytes is to govern the extension of 8-bit registers and operations to 16-bit, and 16-bit addresses to 24-bit. Those are operations that you'd expect to take more time and space anyway, so a prefix byte that adds slightly to those properties is acceptable as a means of controlling the combinatorial explosion that otherwise results. The mode bits that the '816 introduced are a more awkward way of doing the same thing, as noted above. An advantage to this approach is that you don't lose addressing modes when you move up to the wider address space, purely due to lack of opcode space. And yes, being able to do ALU ops on the index registers as well as the accumulator could sometimes be very handy; a prefix byte could enable that while still keeping the accumulator primary for arithmetic.


yes i see this being useful, for example "0xFF" could be used as a 16 bit prefix.
so an instruction like "ADC #" (Add with Carry) with an opcode 0x96 could be extended to 0xFF96 which would be "AWC #" (Add Word with Carry)
that is how i would design a 16 bit CPU as well, have the upper byte select from 256 Opcode tables, and the lower byte selects what instruction specifically to execute.
that allows you to neatly sort all instructions you have and always have more space for expansion.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 03, 2020 11:56 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
How much and what kind of programming experience do you have with the 65's or other processor? It may be valuable to consider what you want to do with your design. There's an awful lot of criticism for the 65816's mode bits; but in my '816 Forth, I put it in native mode in the reset routine and never touch that bit again, and I leave A in 16-bit and the index registers in 8-bit almost full time, seldom touching the M & X bits. IOW, it's not an issue. The '816 also has a lot of the instructions mentioned here, like long relative branching, TYX, TXY, and loads more. It's already there. What instruction may seem to be lacking are absent probably because they would not be used much and they're easily and efficiently simulated with other instructions, like BSR (branch to subroutine), or because they would require slowing the clock down (for a given silicon geometry), or would take too much expensive silicon real estate to justify the small benefit.

The '02 is a very poor target for a C compiler, at least until someone makes super intelligent one that can figure out the intent and change the entire approach to fit the 6502's way of doing things. The 65816 seems to be much better suited for C compilers, although the processors with lots of internal registers seem to be even better suited for C compilers. One thing a few of us would like is a really efficient way to do NEXT in ITC Forth.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 7:53 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
(Because MUL and DIV save so very many cycles compared to software solutions, I'd say it's easily enough to have them only operate in one way on one fixed set of registers. For example, XA gets A times X, or BA gets B times A. It would be a useful bit of preparatory research to survey many comparable microprocessors to see what choices were made. The 6809 might have been the first to offer a multiply instruction.)


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 8:49 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
BigEd wrote:
The 6809 might have been the first to offer a multiply instruction.


The TI 9900 was the first. It also had divide.

Both instructions were criticized for being slow.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 9:52 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
ah, thanks. There's slow, and then there's very slow - hopefully they were faster than doing it by hand.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 11:52 am 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
GARTHWILSON wrote:
How much and what kind of programming experience do you have with the 65's or other processor? It may be valuable to consider what you want to do with your design.

with 65xx based processors, not much. In the past I made a few CPUs based on the 6502 and mostly programmed with those.
This and my SBC Project helped me greatly to better adapt to the weirdness of the 6502.
to be honest i find it hard to judge my own skill with writing for these CPUs, i wouldn't call myself an expert by any means, but i wouldn't count myself as a novice either (atleast i'd like to think so).

and about this projects, mainly it's just to learn about the CPU, but at the same time it's for a faster softcore version of the 6502 that I (and maybe others) can use for future FPGA based Projects.
having the ability to add convenient instructions is just a bonus that comes with designing it from scratch.

GARTHWILSON wrote:
There's an awful lot of criticism for the 65816's mode bits; but in my '816 Forth, I put it in native mode in the reset routine and never touch that bit again, and I leave A in 16-bit and the index registers in 8-bit almost full time, seldom touching the M & X bits. IOW, it's not an issue. The '816 also has a lot of the instructions mentioned here, like long relative branching, TYX, TXY, and loads more. It's already there.

that is how i thought about my idea of the "Enable Carry" flag as well, i would just leave it disabled for the whole program unless i need it in a specific function.
and i somewhat knew some of these were already in the 65816 so i thought it would be nice to have them here as well, because they are useful.

GARTHWILSON wrote:
What instruction may seem to be lacking are absent probably because they would not be used much and they're easily and efficiently simulated with other instructions, like BSR (branch to subroutine), or because they would require slowing the clock down (for a given silicon geometry), or would take too much expensive silicon real estate to justify the small benefit.

how do you even simulate BSR? you would need to add a 16 bit signed value to the PC...
anyways, this is one of the reasons i ask about the instruction set on here before i actually implement it, and to see what instructions can be replaced by a 6502 macro/function, to see if it's actually worth implementing that instruction.
plus unlike any real CPU i don't have to worry about selling or even producing this CPU, so i can affort to care less about actual resource usage and complexcity.

I feel like we now re-talked about the same point in 3 different replies.

GARTHWILSON wrote:
The '02 is a very poor target for a C compiler, at least until someone makes super intelligent one that can figure out the intent and change the entire approach to fit the 6502's way of doing things. The 65816 seems to be much better suited for C compilers, although the processors with lots of internal registers seem to be even better suited for C compilers. One thing a few of us would like is a really efficient way to do NEXT in ITC Forth.


well good thing i never planned to program C for the 6502...
I know you really like your 65816, but personally i'm currently more comfortable with my 8 bit data bus and a 16 bit Memory space.
one day when i get to 16 bits i'll be sure to look into some 65816 projects. :wink:


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 11:57 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Most CPUs capable of multiplication can produce a result that is as wide as the sum of the input operands' widths. In this way it is possible to perform multiprecision multiplication (ie. of values wider than the CPU can handle natively) using the speed advantage of the hardware multiplier, instead of resorting to a pure shift-and-add loop again. However, handling multiprecision division is much more difficult.

They differ as to whether more than one instruction is needed to produce both halves; RISC CPUs generally have two different instructions each filling one register, CISC usually gives two result registers filled by a single instruction. The 6502 is technically a CISC design, but is sometimes thought of as "kind of" RISC in some aspects. Crucially it has something of a shortage of registers, which might make the decision for you.

Something I don't think I've mentioned on the forum is that I've been mulling over a design for memory mapped accelerator for multiplication and division. In a 16-byte memory space, this can be made to handle one operand (the divisor) with 14-byte width, enough to be useful on quadruple-precision floats, and the other operand can be as wide as you like, fed in and processed one byte at a time. The tricky part is making it fast enough to be worthwhile on narrower operands while still functioning correctly on wide ones; a naive implementation using 74-series adders would have marginal timing at just 2MHz. I'll cover the details in its own thread when it's ready - I just thought you might want to be aware of the alternative approach to providing this function.

Quote:
how do you even simulate BSR? you would need to add a 16 bit signed value to the PC...

You JSR to a fixed-location subroutine which duplicates the return address on the stack. Then you BRA or BRL to the actual subroutine.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 4:16 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
Chromatix wrote:
Most CPUs capable of multiplication can produce a result that is as wide as the sum of the input operands' widths. In this way it is possible to perform multiprecision multiplication (ie. of values wider than the CPU can handle natively) using the speed advantage of the hardware multiplier, instead of resorting to a pure shift-and-add loop again. However, handling multiprecision division is much more difficult.

They differ as to whether more than one instruction is needed to produce both halves; RISC CPUs generally have two different instructions each filling one register, CISC usually gives two result registers filled by a single instruction. The 6502 is technically a CISC design, but is sometimes thought of as "kind of" RISC in some aspects. Crucially it has something of a shortage of registers, which might make the decision for you.

not entirely sure what the decision would be, since i got the B Register for exactly these kinds of instruction.
MUL is basically just (A * M), where M is just the data from whatever addressing mode was choosen. the result is a 16 bit number that gets stored in A and B, with A getting the lower half, and B the upper half.
DIV is similar (A / M), except the result of the division is stored in A and the remainder in B.

Chromatix wrote:
Something I don't think I've mentioned on the forum is that I've been mulling over a design for memory mapped accelerator for multiplication and division. In a 16-byte memory space, this can be made to handle one operand (the divisor) with 14-byte width, enough to be useful on quadruple-precision floats, and the other operand can be as wide as you like, fed in and processed one byte at a time. The tricky part is making it fast enough to be worthwhile on narrower operands while still functioning correctly on wide ones; a naive implementation using 74-series adders would have marginal timing at just 2MHz. I'll cover the details in its own thread when it's ready - I just thought you might want to be aware of the alternative approach to providing this function.

To me that sounds like the description of a Co-Processor.
one that is mapped into Memory like any other IO Device, unlike something like the 8087, which is basically just an optional extension of the CPU itself.
it sounds really interesting if that is what you mean.
It would allow you to just plug that Co-Processor into any existing 65xx based System, and as long as the software was aware of it it could make use of the accelerated functions.

Chromatix wrote:
Quote:
how do you even simulate BSR? you would need to add a 16 bit signed value to the PC...

You JSR to a fixed-location subroutine which duplicates the return address on the stack. Then you BRA or BRL to the actual subroutine.

but i thought the whole point of long relative jumps and calls was to get rid of any "fixed locations" in memory (with exceptions of IO).... so that the program could be moved around in memory without hindering/breaking it's function.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 4:41 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Memory-mapped coprocessors are not really a novel concept, though most 32-bit CPUs have architectural support for coprocessors and so don't need them. There are some unusual processor designs which do everything this way, called Transport Triggered Architectures; their only operation is a "move", making them a One Instruction Computer. Want to add two numbers? Move them to these two addresses, then move the result out of a third.

AMD used to make a couple of FPUs that were supposed to sit on the data bus and look like I/O devices. They had their own internal register sets, and some operations took several hundred cycles to complete (though this was still a major improvement over pure software). This would be something much simpler than that, but capable of handling much wider values - as I said, the target is quadruple-precision floats.

As for emulating BSR - yes, the need for a fixed-location subroutine is a downside. Such a routine could however be provided by the system ROM. On the '816, where position-independent code is more expected, you have the additional option of the PER instruction, which lets you push a PC-relative return address on the stack directly. So the emulation sequence would then be PER *+6 : BRL sub.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 6:30 pm 
Offline

Joined: Fri Apr 06, 2018 4:20 pm
Posts: 94
For the sake of compatibility rather than adding multiplication and division as instructions couldn't you implement a 65816-like "Coprocessor" instruction and build them as peripherals? Or is that cheating? (once you add a coprocessor you can do all sorts of things, like widen the registers to avoid carries and store intermediate results, etc.)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 127 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 9  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: