Page 1 of 2
Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 3:40 pm
by Jeff_Birt
Hi all,
I'm cleaning up a project I started late last year to play 'DIGIs' (digital samples) on the C64 SID. After figuring out how to process the sample is Audacity for good results: 8-bit samples at 8khz. I then reduce the samples to 4-bit, doing some rounding up, and pack two per byte.
I also tried to do a simple Run Length Encoding, which reduced the sample file size by about 25%. And while I can decode it easy enough on the C64 it takes far too long to do to process in an 8khz interrupt loop. It would work at 4khz I suspect though.
Anyhow, I was optimizing my simple, non-RLE, interrupt routine this morning and was unhappy with how long it took to toggle a bit in zero page that I'm using as a flag to know which nibble should be played next.
Originally I was doing this to check, and then toggle the bit at the end. The #- indicates clock cycles with (#) as the total for that block. So originally it took 14 clock cycles ignoring the single-cycle difference in BNE if branch take or not.
Code: Select all
;every other NMI do *1 or *2
LDA flag ; 3- if flag==0 we just played upper nibble
BNE lower ; 2- (5) so skip ahead to load new byte
...other code
LDA flag ; 3- toggle hi/low nibble flag and exit NMI
EOR #$1 ; 3-
STA flag ; 3- (9)
I reduced this to 12 cycles with the following:
Code: Select all
LDA #$01 ; 2-
AND flag ; 3-
BNE lower ; 2- (7)
INC flag ; 5 (5)
So, not a huge savings but even two clock cycles in an 8khz interrupt is a lot of cycles per second. It still seems like it could be simplified though. I'm not sure what else could be done with this structure but maybe it would be possible to combine the toggle and test into one block?
It got me wondering what clever ways had been devised to handle a simple task like this over the years? I did a search on the forum but the terms 'bit toggle' brings up many, many threads of course.
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 5:04 pm
by Martin A
IF you don't mind using a whole zero page byte for one bit then $FF and be flipped to $00 with INC zp and flipped back again with DEC zp.
BIT ZP will then set the sign flag according to the state of the top bit, which is flipping between 0 and 1 (as are all the other bits).
That's 5 cycles to flip the flag and 3 to test, and it doesn't upset any registers.
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 5:27 pm
by Jeff_Birt
IF you don't mind using a whole zero page byte for one bit then $FF and be flipped to $00 with INC zp and flipped back again with DEC zp.
If I understand you correctly I would have to decide when to inc and when to dec which I don't want to do. Each time through the interrupt it just needs to toggle the flag. Using a whole byte of ZP for the flag is OK in this case.
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 6:28 pm
by barrym95838
Code: Select all
LDA #$01 ; 2-
AND flag ; 3-
BNE lower ; 2- (7)
...
INC flag ; 5 (5)
I can save you a byte (or maybe 3), but without further context I can't save you any cycles:
Code: Select all
LDA flag ; 3
LSR A ; 2
INC flag ; 5
BCS lower ; 2+ (12+)
...
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 6:37 pm
by Chromatix
If you had a CMOS 6502, you could use the TRB and TSB instructions. On Rockwell and WDC versions, you even have BBR/Sx and R/SMBx to assist. But on the C64, you don't have those niceties.
The most straightforward approach, which you seem to have missed, would be to use exclusive-or:
Code: Select all
LDA #1 ; 2 cyc - any non-zero constant will do here
EOR flag ; 3 cyc
STA flag ; 3 cyc
BNE FlagWasClear ; 2.5 cyc
FlagWasSet:
...
FlagWasClear:
...
That gives you a 10.5 cycle (on average) flag test and toggle, including the branch.
Using the INC/DEC method proposed by Martin:
Code: Select all
BIT flag ; 3 cyc
BPL FlagWasClear ; 2.5 cyc
FlagWasSet:
INC flag ; 5 cyc
...
FlagWasClear:
DEC flag ; 5 cyc
...
Including the branch, that is also 10.5 cycles on average. Both of the above versions are also 8 bytes.
For yet another approach, we start by initialising the flag byte to $AA or $55, depending on which case you want to run first. Then we can get an alternating series with the following:
Code: Select all
ASL flag ; 5 cyc
BCC FlagWasClear ; 2.5 cyc
INC flag ; 5 cyc - inject a set bit at the right-hand end of the byte to restore the pattern
FlagWasSet:
...
FlagWasClear:
...
This reduces the average cycle count to 10 cycles exactly, saving half a cycle, because the INC is only executed half the time. It also saves two bytes, for a total of 6.
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 7:44 pm
by barrym95838
Code: Select all
ASL flag ; 5 cyc
BCC FlagWasClear ; 2.5 cyc
INC flag ; 5 cyc - inject a set bit at the right-hand end of the byte to restore the pattern
FlagWasSet:
...
FlagWasClear:
...
That's too cool for school!

Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 8:33 pm
by commodorejohn
Yup, that's a bingo.
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 9:03 pm
by Jeff_Birt
The most straightforward approach, which you seem to have missed, would be to use exclusive-or:
Code: Select all
LDA #1 ; 2 cyc - any non-zero constant will do here
EOR flag ; 3 cyc
STA flag ; 3 cyc
BNE FlagWasClear ; 2.5 cyc
FlagWasSet:
...
FlagWasClear:
...
That gives you a 10.5 cycle (on average) flag test and toggle, including the branch.
That is what I was talking about with 'combiining into one block' rather than having the bit test and toggle in seperate places, not even sure why I had them seperated to begin with. This is also the most striaghtforward approach and is very clear to anyone else reading the code what is happening.
Using the INC/DEC method proposed by Martin:
Code: Select all
BIT flag ; 3 cyc
BPL FlagWasClear ; 2.5 cyc
FlagWasSet:
INC flag ; 5 cyc
...
FlagWasClear:
DEC flag ; 5 cyc
...
Including the branch, that is also 10.5 cycles on average. Both of the above versions are also 8 bytes.
That would work as well.
For yet another approach, we start by initialising the flag byte to $AA or $55, depending on which case you want to run first. Then we can get an alternating series with the following:
Code: Select all
ASL flag ; 5 cyc
BCC FlagWasClear ; 2.5 cyc
INC flag ; 5 cyc - inject a set bit at the right-hand end of the byte to restore the pattern
FlagWasSet:
...
FlagWasClear:
...
This reduces the average cycle count to 10 cycles exactly, saving half a cycle, because the INC is only executed half the time. It also saves two bytes, for a total of 6.
I think this wins the clever award! It might well leave someone else scractching thier head or even me scratching my head 6 months from now when wondering why I did that
I think all three approaches have their own merit. I think I'll implement #1 and then post the whole ISR to see if there are other savings to be had.
Thanks everyone!
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 9:15 pm
by Chromatix
The same code, with different initialisation values, can be used to generate a set (or cleared) bit every 2 cycles, every 4 cycles, or every 8 cycles - indeed, in any repeating pattern of length 8. The execution time is 12 cycles for every set bit, and 8 cycles for every cleared bit. It has the additional advantage that the only register it clobbers is the status bits.
For an arbitrary interval between events, consider the following which is just as fast, and only goes back up to 8 bytes of code:
Code: Select all
DEC count ; 5 cyc
BNE NotYet ; 2-3 cyc
LDA #interval ; 2 cyc
STA count ; 3 cyc
DoEvent:
...
NotYet:
...
This does clobber the accumulator, but you can substitute X or Y if that's more convenient. And yes, you can set interval=2 to get the behaviour desired by the OP.
There's always more than one way to do it.

Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 11:04 pm
by Jeff_Birt
The same code, with different initialisation values, can be used to generate a set (or cleared) bit every 2 cycles, every 4 cycles, or every 8 cycles - indeed, in any repeating pattern of length 8. The execution time is 12 cycles for every set bit, and 8 cycles for every cleared bit. It has the additional advantage that the only register it clobbers is the status bits.
Another very interesting trick!
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Mon Apr 20, 2020 11:53 pm
by Jeff_Birt
Here is the NMI for the DIGI player. One thing that just occurred to me is that I could make sure that the samples always end on a page boundary. That way I only need to compare the high byte of the sample ending address. That would save 9-10 cycles.
It might also be interesting to look at using a 4khz sample rate and interpolating between samples. Maybe something like '(current_nibble + last_nibble) > 1'
Code: Select all
;-------------------------------------------------------------------------------
; NMI handler routine, plays one 4bit sample per pass
NMI_HANDLER
; start with saving state
PHA ; 3- will restore when returning
TYA ; 2- from interrupt handler
PHA ; 3- (8)
; play 4-bit sample, first sample byte saved during Init
LDA sample ; 3- load sample byte
ORA #$10 ; 2- make sure we don't kill filter settings
AND #$1F ; 2- git rid of any dangling high bits
STA SID+$18 ; 4- save to SID volume regsiter
STA $D020 ; 4- change border color for something to look at
LDA $DD0D ; 4- (19)clear NMI
;every other NMI do *1 or *2
LDA #$01 ; 2- if flag!=0 we just played upper nibble
EOR flag ; 3-
STA flag ; 3-
BNE lower ; 2-3 (10-11) so skip ahead to load new byte
upper LDA sample ; 3- *1 shift upper nibble down
LSR a ; 2-
LSR a ; 2-
LSR a ; 2-
LSR a ; 2-
STA sample ; 3- store it back to play next pass
JMP exit ; 3- (17) all done for this pass
lower LDY #$0 ; 2- *2 get a new packed sample byte
LDA (ptr),y ; 5-
STA sample ; 3- save to temp location
INC ptr ; 5- inc point to next sample byte
BNE checkend ; 2-3 did we roll low byte over to zero?
INC ptr+1 ; 5- (22-23) if so inc the high byte of ptr too
checkend
LDA ptr ; 3- if not @ end of sample exit/return from NMI
CMP #<DATASTOP ; 4- low byte
BNE exit ; 2-3 (10)
LDA ptr+1 ; 3- high byte
CMP #>DATASTOP ; 4
BNE exit ; 2-3 (18-19)
; when done with sample, turn off NMI interrupt
LDA #$00 ; 2- turn off NMI
STA $DD0E ; 4- timer A stop-CRA, CIA #1 DC0E
LDA #$4F ; 2- disable all CIA-2 NMIs
STA $DD0D ; 4- ICR - interrupt control / status
LDA $DD0D ; 4- (16) sta/lda to ack any pending int
LDA #$37 ; 2- reset kernal banking
STA $01 ; 3- (5)
exit
PLA ; 3- restore state
TAY ; 2-
PLA ; 3-
RTI ; 6- (14) return from this pass of NMI
; 68 *1, 84-85,93-94 *2
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Tue Apr 21, 2020 5:03 am
by Chromatix
A couple of good ways to save a few cycles, if you need them:
In general the fall-through branch is faster than taking it, so you should use the fall-through for the common case and take the branch for the exceptional case. That means moving code that is executed rarely out-of-line, and branching back to the main line. If you align the sample with the page end, as suggested, that means you should swap the code blocks beginning "checkend" and "exit".
In the ISR entry and exit, it's faster by 3 cycles each to preserve Y in a ZP byte than on the stack. If you have a spare byte, do that. If you have *two* spare bytes, put A there too to save another cycle at each end. In total that buys you 8 cycles. But you might not need to use Y at all, see below.
I assume this routine will be in RAM. In that case, save yet more cycles and some ZP space by switching the LDA (ptr),Y for an LDA absolute, and self-modify its operand. This saves 3 cycles directly; the instruction itself takes one fewer cycle, and you eliminate LDY #0. On the downside, the INC ptr (where ptr is now an absolute address in the NMI routine) in the fast path takes an extra cycle. But the biggest saving is 12 cycles in the ISR prologue/epilogue, as you no longer need TYA:PHA and PLA:TAY; this is the only place you needed the Y register.
Putting all of this together:
Code: Select all
NMI_Handler:
PHA
; load sample into SID
LDA sample
ORA #$10
AND #$1F
STA SID+$18
STA $D020 ; border colour
LDA $DD0D ; clear NMI
; alternate high/low nybble
ASL flag
BCC @lower
INC flag
@upper:
LDA sample
LSR A
LSR A
LSR A
LSR A
STA sample
; local exit code is now both smaller and faster than jumping to it
PLA
RTI
@lower:
LDA $FFFF ; self-mod pointer to sample - overwrite this operand during init
STA sample
INC @lower+1
BEQ @nextPage
; local exit code is now both smaller and faster than jumping to it
PLA
RTI
@nextPage:
; this is only executed every 512 samples
LDA @lower+2
ADC #1 ; carry is clear from flag test
CMP #stopPage
BEQ @stop
STA @lower+2
; local exit code is now both smaller and faster than jumping to it
PLA
RTI
@stop:
; switch off the timers
LDA #0
STA $DD0E
LDA #$4F
STA $DD0D
LDA $DD0D
; restore banking
LDA #$37
STA $01
PLA
RTI
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Wed Apr 22, 2020 1:34 am
by Jeff_Birt
Thank you Chromatix. I did not get a chance to get back to this today. I set out to test how the C64 diagnostic harness worked on the C128 with a PCB a friend designed for the KB dongle and the C128 quit right in front of me. No clock coming out of the 8701 clock generation chip, but the 8701 itself was OK. The bloody crystal died sitting right there on the bench. Was using it, turned it off and back on about 15 minutes later and it was 'Dead Jim'.
Anyhow your changes make sense. I tend to shy away from self modifying code unless it has a distinct advantage and I think your use case shows a strong advantage. I'll give this a go in the next few days and then work on the interpolated 4khz version.
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Thu Apr 23, 2020 7:15 am
by dp11
All the above I think leads to now removing the zp sample location which make the average case even quicker
Code: Select all
NMI_Handler:
PHA
; load sample into SID
@sample:
LDA $FFFF ; self-mod pointer to sample - overwrite this operand during init
; alternate high/low nybble
ASL flag
BCC @writeSID
INC flag
LSR A
LSR A
LSR A
LSR A
INC @sample+1
BEQ @nextpage
@writesample
ORA #$10
AND #$1F
STA SID+$18
STA $D020 ; border colour
LDA $DD0D ; clear NMI
; local exit code is now both smaller and faster than jumping to it
PLA
RTI
@nextPage:
; this is only executed every 512 samples
INC @sample+2
PHA
LDA @sample+2
CMP #stopPage+1
PLA
BCC writesample
@stop:
ORA #$10
AND #$1F
STA SID+$18
STA $D020 ; border colour
; switch off the timers
LDA #0
STA $DD0E
LDA #$4F
STA $DD0D
LDA $DD0D
; restore banking
LDA #$37
STA $01
PLA
RTI
The maybe typos but you should get the idea.
The ending looks as though It can be tidied up.
Re: Most efficient way to toggle a bit (flag) in memory
Posted: Thu Apr 23, 2020 7:28 am
by Chromatix
I see what you're trying to do there, and I considered making a similar adjustment myself, but this now means there's a different delay between /NMI and the sample appearing in the SID for alternate samples; one path has to go through the LSR chain and possibly the page-increment code, while the other goes directly. That sort of timing effect could produce an audible distortion.
And, for the sake of one ZP byte, the code is a great deal messier and harder to read. That's in part because the labels aren't really consistent with function any more, but there is also more duplication of code. I think if you were that desperate for a ZP byte, you should just move the sample storage to an absolute address, with very little code size/speed penalty.