65C02 TSB and TRB
-
leeeeee
- In Memoriam
- Posts: 347
- Joined: 30 Aug 2002
- Location: UK
- Contact:
65C02 TSB and TRB
Greetings all.
All the datasheets I have for the 65C02 instruction set say that TSB and TRB operate as follows ..
TSB does A AND M -> M, sets/clears Zb on the result
TRB does ~A AND M -> M, sets/clears Zb on the result
.. but having investigated this on a real 65C02 core (CCU3000 single chip micro) it seems to do this ..
TSB sets/clears Zb on the result of A AND M then does A OR M -> M
TRB sets/clears Zb on the result of A AND M then does ~A AND M -> M
Is this right? Or is there some different 'third way' that I'm as yet unaware of?
Cheers,
Lee.
All the datasheets I have for the 65C02 instruction set say that TSB and TRB operate as follows ..
TSB does A AND M -> M, sets/clears Zb on the result
TRB does ~A AND M -> M, sets/clears Zb on the result
.. but having investigated this on a real 65C02 core (CCU3000 single chip micro) it seems to do this ..
TSB sets/clears Zb on the result of A AND M then does A OR M -> M
TRB sets/clears Zb on the result of A AND M then does ~A AND M -> M
Is this right? Or is there some different 'third way' that I'm as yet unaware of?
Cheers,
Lee.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Here it is out of WDC's programming manual which every 6502/65816 hobbyist or professional should have:
For TSB: "Logically OR together the value in the accumulator with the data at the effective address specified by the operand. Store the result at the memory location. ..."
and for TRB: "Logically AND together the _complement_ of the value in the accumulator with the data at the effective address specified by the operand. Store the result at the memory location. ..."
For both, it says Z is set or cleared based on a second, different operation, working just like BIT does, for both TSB and TRB (no difference between the two). This test operation only affects Z, and no results are kept other than in Z.
Garth
For TSB: "Logically OR together the value in the accumulator with the data at the effective address specified by the operand. Store the result at the memory location. ..."
and for TRB: "Logically AND together the _complement_ of the value in the accumulator with the data at the effective address specified by the operand. Store the result at the memory location. ..."
For both, it says Z is set or cleared based on a second, different operation, working just like BIT does, for both TSB and TRB (no difference between the two). This test operation only affects Z, and no results are kept other than in Z.
Garth
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Hi, all.
I figured that it was more appropriate to ask my questions in this thread rather than subject the other thread to more off-topic pollution.
Sub-topic 1: The Rockwell 'c02 seems to use 32 opcodes to set, reset, and test-and-branch individual zero-page bits. Does anyone here use these? It seems to me that they would be very fast and compact for flags, but nearly useless for I/O, unless port(s) were mapped into zero-page, a la 6510. Does anyone have any commented code snippets to share, showing how these guys work? (links would suffice)
Sub-topic 2: The WDC 'c02 seems to use just four opcodes to test-and-set and test-and-reset up to 8 bits anywhere in RAM. I know that Garth and BDD have used these, but I would be interested in knowing if there is a good reason for them only changing the Z flag. Why not N and V as well, like their close cousin, BIT? I'm guessing that there is a reason, but is it to make the hardware less complicated, or the hypothetical software using it less complicated, or both, or neither? Does anyone have any commented code snippets to share, showing how these guys work? (links would suffice)
Sub-topic 3: The WDC and Rockwell versions don't exist together on any design, as far as I know, due to functional overlap. If a hypothetical processor design implemented the Rockwell-like set with full-RAM addressing, do you guys think that it would be preferable to the WDC-like set?
Thanks,
Mike
I figured that it was more appropriate to ask my questions in this thread rather than subject the other thread to more off-topic pollution.
Sub-topic 1: The Rockwell 'c02 seems to use 32 opcodes to set, reset, and test-and-branch individual zero-page bits. Does anyone here use these? It seems to me that they would be very fast and compact for flags, but nearly useless for I/O, unless port(s) were mapped into zero-page, a la 6510. Does anyone have any commented code snippets to share, showing how these guys work? (links would suffice)
Sub-topic 2: The WDC 'c02 seems to use just four opcodes to test-and-set and test-and-reset up to 8 bits anywhere in RAM. I know that Garth and BDD have used these, but I would be interested in knowing if there is a good reason for them only changing the Z flag. Why not N and V as well, like their close cousin, BIT? I'm guessing that there is a reason, but is it to make the hardware less complicated, or the hypothetical software using it less complicated, or both, or neither? Does anyone have any commented code snippets to share, showing how these guys work? (links would suffice)
Sub-topic 3: The WDC and Rockwell versions don't exist together on any design, as far as I know, due to functional overlap. If a hypothetical processor design implemented the Rockwell-like set with full-RAM addressing, do you guys think that it would be preferable to the WDC-like set?
Thanks,
Mike
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re:
barrym95838 wrote:
I figured that it was more appropriate to ask my questions in this thread rather than subject the other thread to more off-topic pollution.
Quote:
Sub-topic 1: The Rockwell 'c02 seems to use 32 opcodes to set, reset, and test-and-branch individual zero-page bits. Does anyone here use these? It seems to me that they would be very fast and compact for flags, but nearly useless for I/O, unless port(s) were mapped into zero-page, a la 6510. Does anyone have any commented code snippets to share, showing how these guys work? (links would suffice)
Quote:
Sub-topic 2: The WDC 'c02 seems to use just four opcodes to test-and-set and test-and-reset up to 8 bits anywhere in RAM. I know that Garth and BDD have used these, but I would be interested in knowing if there is a good reason for them only changing the Z flag. Why not N and V as well, like their close cousin, BIT? I'm guessing that there is a reason, but is it to make the hardware less complicated, or the hypothetical software using it less complicated, or both, or neither? Does anyone have any commented code snippets to share, showing how these guys work? (links would suffice)
Quote:
Sub-topic 3: The WDC and Rockwell versions don't exist together on any design, as far as I know, due to functional overlap. If a hypothetical processor design implemented the Rockwell-like set with full-RAM addressing, do you guys think that it would be preferable to the WDC-like set?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: 65C02 TSB and TRB
Thanks, Garth! I knew that I could count on you. Your link to the TRB & TSB code was perfect, and has caused me to lean toward them over the Rockwell stuff, for a few reasons.
1) The assembly language is more '6502'-like, which is important to me.
2) The ability to set or reset more than one bit at a time seems useful.
3) The op-code footprint is smaller.
Regarding the flag behavior: I see that you are using these instructions as simple outputs in your SPI example, but it's possible that there could be a use for knowing something about the location that you're modifying. I'm thinking of semaphores and the like, but it also occurs to me that some I/O ports have a bi-directional quality that could benefit from this too.
The finer control offered by the Rockwell stuff might be able to implement algorithms for compression or encryption more economically (especially when combined with indexed addressing, which is automatically included in the 65m32), but I don't know enough about these algorithms to decide whether the benefits would outweigh the cost of the additional op-code space.
I certainly don't have room for both, so unless someone offers a compelling argument in favor of the Rockwell stuff (with example code), I'm going to set my sights on TSB/TRB.
Mike
1) The assembly language is more '6502'-like, which is important to me.
2) The ability to set or reset more than one bit at a time seems useful.
3) The op-code footprint is smaller.
Regarding the flag behavior: I see that you are using these instructions as simple outputs in your SPI example, but it's possible that there could be a use for knowing something about the location that you're modifying. I'm thinking of semaphores and the like, but it also occurs to me that some I/O ports have a bi-directional quality that could benefit from this too.
The finer control offered by the Rockwell stuff might be able to implement algorithms for compression or encryption more economically (especially when combined with indexed addressing, which is automatically included in the 65m32), but I don't know enough about these algorithms to decide whether the benefits would outweigh the cost of the additional op-code space.
I certainly don't have room for both, so unless someone offers a compelling argument in favor of the Rockwell stuff (with example code), I'm going to set my sights on TSB/TRB.
Mike
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: 65C02 TSB and TRB
barrym95838 wrote:
Thanks, Garth! I knew that I could count on you. Your link to the TRB & TSB code was perfect, and has caused me to lean toward them over the Rockwell stuff, for a few reasons.
Quote:
but it also occurs to me that some I/O ports have a bi-directional quality that could benefit from this too
Quote:
The finer control offered by the Rockwell stuff might be able to implement algorithms for compression or encryption more economically (especially when combined with indexed addressing, which is automatically included in the 65m32), but I don't know enough about these algorithms to decide whether the benefits would outweigh the cost of the additional op-code space.
Quote:
I certainly don't have room for both, so unless someone offers a compelling argument in favor of the Rockwell stuff (with example code), I'm going to set my sights on TSB/TRB.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: 65C02 TSB and TRB
barrym95838 wrote:
The finer control offered by the Rockwell stuff might be able to implement algorithms for compression or encryption more economically (especially when combined with indexed addressing, which is automatically included in the 65m32), but I don't know enough about these algorithms to decide whether the benefits would outweigh the cost of the additional op-code space.
Encryption algorithms are numeric processes that generally work with integer byte values, and don't involve much of any bit twiddling.
Compression often involves dealing with streams of variable-length numbers encoded in bytes. On the reading end, a zero test is useless for "get the next 5-bit number out of a byte stream", and we have the AND instruction anyway to pull masked values. Plus, decompression doesn't need to modify the values its reading, so the modification part of TxB is superfluous for this use.
Writing to such a stream (assuming the current byte was initialized to zero) could be done with TSB, again as long as the addressing mode is flexible. Compression is slow anyway, writing has to deal with output values crossing byte boundaries regardless, and compression is often handled on a cross-development host instead of on the 6502, so that really doesn't need this particular micro-optimization. The amount of work saved is dwarfed by everything else going on.
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: 65C02 TSB and TRB
I've never put RMB, SMB, BBS, and BBR to use, primarily for the same reasons offered by Garth, plus the 65C816 doesn't implement them. I do have a fair number of TRBs and TSBs sprinkled through the POC firmware, and these instructions will get more use as I write code to implement my 816NIX filesystem and develop a lightweight kernel to go with it. TRB and TSB can be useful in manipulating bitmaps (using self-modifying code—neither instruction has an indexed addressing mode), and it so happens that I use several bitmaps in the 816NIX internal structure to allocate and release inodes and data blocks. A two instruction sequence, such as:
is somewhat faster and tidier than:
That said, the lack of indexing on the TRB and TSB instructions can complicate things. I could write:
and not have to modify the operand of a TSB instruction, which can be messy.
Either way, the accumulator is going to get clobbered. Decisions, decisions...
Code: Select all
lda #%00100000
tsb bmaddrCode: Select all
lda bmaddr
and #%00100000
sta bmaddrCode: Select all
lda bmbase,x
and #%00100000
sta bmbase,xEither way, the accumulator is going to get clobbered. Decisions, decisions...
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: 65C02 TSB and TRB
Code: Select all
lda bmbase,x
and #%00100000
sta bmbase,xCode: Select all
lda #5 ; set the 5th bit
bms bmbase,x ; relative to the bmbase,x
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: 65C02 TSB and TRB
Nice, Rob!
It looks like you did a hybrid of the Rockwell and WDC set, by putting the bit number in a, instead of the AND/OR mask. I was considering something similar, but I haven't made up my mind yet. How much time did you spend considering the relative benefits of your idea vs. the multi-bit WDC and multi-opcode Rockwell plans? I'm almost never sure until I mock up some translations of working code.
I believe that you previously mentioned breaking the 8-bit op-code barrier for the RTF65002, so you probably have a lot more room to expand than I do, unless I follow the advice of a couple of friends and steal a bit or two from my embedded constant. I have some stubborn tendencies, so I don't know if that will wind up being in the cards or not. I need to pull the trigger soon, though!
Mike
It looks like you did a hybrid of the Rockwell and WDC set, by putting the bit number in a, instead of the AND/OR mask. I was considering something similar, but I haven't made up my mind yet. How much time did you spend considering the relative benefits of your idea vs. the multi-bit WDC and multi-opcode Rockwell plans? I'm almost never sure until I mock up some translations of working code.
I believe that you previously mentioned breaking the 8-bit op-code barrier for the RTF65002, so you probably have a lot more room to expand than I do, unless I follow the advice of a couple of friends and steal a bit or two from my embedded constant. I have some stubborn tendencies, so I don't know if that will wind up being in the cards or not. I need to pull the trigger soon, though!
Mike
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: 65C02 TSB and TRB
Rob Finch wrote:
Code: Select all
lda bmbase,x
and #%00100000
sta bmbase,xCode: Select all
lda #5 ; set the 5th bit
bms bmbase,x ; relative to the bmbase,x
That's an analog of the RMB and SMB 65C02 instructions, which don't exist in the 65C816. The significant improvement, I think, is that conceivably the mask in .A could have multiple bits set.
My interests, however, are in using the actual WDC parts, rather than FPGA implementations.
The crux of the problem with TRB and TSB (as well as RMB, SMB and the BBx group) is that none have an indexed addressing mode. These instructions are very useful in manipulating bitwise flags (I use several such flags in the POC's firmware, especially in the TIA-232 drivers) but lacking indexing, use of any of these bit-twiddlers on a bitmap becomes problematic. I envision one solution that would utilize self-modifying code. However, doing so would negate much of the efficiency of the TRB or TSB instruction, as an entire address would have to be set rather than merely adjusting .X or .Y.
x86? We ain't got no x86. We don't NEED no stinking x86!
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: 65C02 TSB and TRB
BigDumbDinosaur wrote:
... A two instruction sequence, such as:
is somewhat faster and tidier than:
Code: Select all
lda #%00100000
tsb bmaddrCode: Select all
lda bmaddr
and #%00100000
sta bmaddrCode: Select all
lda #%00100000
bit bmaddr
php
ora bmaddr
sta bmaddr
plpMike B.
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: 65C02 TSB and TRB
barrym95838 wrote:
BigDumbDinosaur wrote:
... A two instruction sequence, such as:
is somewhat faster and tidier than:
Code: Select all
lda #%00100000
tsb bmaddrCode: Select all
lda bmaddr
and #%00100000
sta bmaddrCode: Select all
lda #%00100000
bit bmaddr
php
ora bmaddr
sta bmaddr
plpMike B.
Code: Select all
lda #%00100000
tsb bmaddrCode: Select all
lda bmaddr
ora #%00100000
sta bmaddrThat the Boolean test occurs before the actual change to the address affected by TRB and TSB can be very useful. Consider, for example, this code fragment from POC V1.1's UART driver:
Code: Select all
lda tiatstab,x ;transmitter status bit mask
trb tiatxst ;transmitter enabled?
beq .0000020 ;yes
;
lda #nxpcrtxe ;no
ldy #nx_cr ;point at command register &...
stasi .chan ;enable transmitter
;
.0000020 ...Incidentally, STASI .CHAN is a stack pointer relative indirect indexed macro-instruction, generating the equivalent of STA (.CHAN,S),Y. The driver keeps ephemeral data on the stack, primarily channel indices, since the UART has two channels. That way one piece of code can service either channel. The .CHAN value is a stack pointer offset that is local to the routine in which this code is located.
x86? We ain't got no x86. We don't NEED no stinking x86!