Generic for loop

Building your first 6502-based project? We'll help you get started here.
nimbus396
Posts: 5
Joined: 12 Feb 2020

Generic for loop

Post by nimbus396 »

I see in these forums, there are a lot about loops but, is there a better way than the following to have a 16 bit for loop where the loop count can change and it is just a call. I have the following piece of code I developed using Skrilldricks page to loop for a specific count. I am developing for the Atari 2600 and I would really just like to have one loop that I can set the loop counter

Code: Select all


define scrLow $00
define scrHigh $01
define color #1

lda #0
sta scrLow
lda #$02
sta scrHigh
ldx #0
ldy #1

fill:      lda scrColor       ; set the color
           sta (scrLow,x)     ; put it on the screen
           lda scrLow         ; load the low byte
           clc                ; clear the carry
           adc #1             ; add one to low byte
           sta scrLow         ; store low byte
           lda scrHigh        ; load the high byte
           adc #0             ; add carry to high byte
           sta scrHigh        ; store it
           cmp #$06           ; compare high byte
           bne fill           ; != so increment low byte
           lda scrLow         ; load low byte
           cmp #$00           ; compare low byte
           bne fill           ; != so increment low byte
     
end:       brk

User avatar
floobydust
Posts: 1394
Joined: 05 Mar 2013

Re: Generic for loop

Post by floobydust »

Incrementing a 16-bit pointer is easy, no need for CLC, ADC #$01, STA xxxx, etc. You can just use INC xxxx and BNE as:

Code: Select all

;INCINDEX subroutine: increment 16 bit variable INDEXL/INDEXH
INCINDEX        INC     INDEXL          ;Increment index low byte
                BNE     SKP_IDX         ;If not zero, skip high byte
                INC     INDEXH          ;Increment index high byte
SKP_IDX
; more code goes here, like LDA INDEXH, followed by CMP #$06
Also, you don't need to do a CMP #0 after a LDA, as the zero flag is set upon loading the A reg if the value is zero.

Code: Select all

LDA     LENL            ;Get length low byte
BEQ     SKP_LENH        ;Test for LENL = zero, branch is zero
Using the above code examples will shorten both your code and execution time.
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: Generic for loop

Post by drogon »

The bit where you add 1 to the 16-bit value can be improved to:

Code: Select all

    inc    scrLow
    bne    :+
    inc    scrHigh
:
rather than the lda, clc, add, sta sequence.

This also preserves the value in A, although the compare will corrupt it.

also the LDA.. CMP #$00 - the CMP isn't needed as the flags are explicitly set on the LDA, so simply

Code: Select all

    lda    scrLow
    bne    fill
will work.

Cheers,

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
nimbus396
Posts: 5
Joined: 12 Feb 2020

Re: Generic for loop

Post by nimbus396 »

Thank you, I will give that a try
nimbus396
Posts: 5
Joined: 12 Feb 2020

Re: Generic for loop

Post by nimbus396 »

Save 5 bytes :). While I agree, comparing zero doesn't need to be there, if the low byte is non-zero, the compare does?

I see I have a lot more studying to do on Flag usage. Flag states seem to be able to remove cycles if used correctly.

Code: Select all

;************** Old Code using ADC
define scrLow $00
define scrHigh $01
define scrColor #1

lda #0
sta scrLow
lda #$02
sta scrHigh
ldx #0

fill:      lda scrColor       ; set the color
           sta (scrLow,x)     ; put it on the screen
           lda scrLow         ; load the low byte
           clc                ; clear the carry
           adc #1             ; add one to low byte
           sta scrLow         ; store low byte
           lda scrHigh        ; load the high byte
           adc #0             ; add carry to high byte
           sta scrHigh        ; store it
           cmp #$05           ; compare high byte
           bne fill           ; != so increment low byte
           lda scrLow         ; load low byte
           cmp #$24           ; compare low byte
           bne fill           ; != so increment low byte
     
end:       brk

Code: Select all

; ************** New code using INC

define scrLow $00
define scrHigh $01
define scrColor #3

lda #0
sta scrLow
lda #$02
sta scrHigh
ldx #0

fill:      lda scrColor       ; set the color
           sta (scrLow,x)     ; put it on the screen
           inc scrLow
           bne skipIdx
           inc scrHigh
skipIdx:   lda scrHigh
           cmp #$05           ; compare high byte
           bne fill           ; != so increment low byte
           lda scrLow         ; load low byte
           cmp #$24           ; compare low byte
           bne fill           ; != so increment low byte
     
end:       brk
User avatar
cjs
Posts: 759
Joined: 01 Dec 2018
Location: Tokyo, Japan
Contact:

Re: Generic for loop

Post by cjs »

nimbus396 wrote:
I see I have a lot more studying to do on Flag usage. Flag states seem to be able to remove cycles if used correctly.
Right. Specific opcodes set specific flags for a reason, and to write efficient assembly code you must know which opcodes set which flags. Once you get good you'll find yourself freqently using flags set not in the previous instruction but several instructions back. You'll also find that, when you return status via flags in your own routines, careful selection of the flags and their values for certain cases will make things more efficient. (E.g., you might choose to use the carry over other flags, and particular meanings for set or clear on routine from a subroutine, because those flag values just "fall out" of the code you wrote without any flag tests or sets/clears.)
Curt J. Sampson - github.com/0cjs
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: Generic for loop

Post by drogon »

nimbus396 wrote:
Save 5 bytes :). While I agree, comparing zero doesn't need to be there, if the low byte is non-zero, the compare does?

I see I have a lot more studying to do on Flag usage. Flag states seem to be able to remove cycles if used correctly.

Code: Select all

; ************** New code using INC

define scrLow $00
define scrHigh $01
define scrColor #3

lda #0
sta scrLow
lda #$02
sta scrHigh
ldx #0

fill:      lda scrColor       ; set the color
           sta (scrLow,x)     ; put it on the screen
           inc scrLow
           bne skipIdx
           inc scrHigh
skipIdx:   lda scrHigh
           cmp #$05           ; compare high byte
           bne fill           ; != so increment low byte
           lda scrLow         ; load low byte
           cmp #$24           ; compare low byte
           bne fill           ; != so increment low byte
     
end:       brk
One thing to note here... The lda scrColor. I don't know what assembler you're using, but this may load from location 3 rather than the immediate value 3.

Also, if you use Y for the scrHigh/Low cmp, you don't alter A, so can save the lda scrColor every time through the loop. (Do it once then the fill label goes on the sta line. You also mention the Atari 2600 (which I know little about other than using them back in the day), if this has a 6502 then "as you were", but if it's something else with a 65C02, then the sta (scrLow,x) can simply be sta (scrLow) which leaves X free for something else, if needed.

Anothe thing to think about (in the '2600 where I understand every cycle is preciously counted!) is having essentially 2 copies of this code too - if you have the ROM space for it... The first version just compares scrHigh and when this matches then you drop into the 2nd copy which just compares scrLow - because we know that scrHigh is already matching at that point, so no need to keep checking. This is a speed optimisation for the value of scrLow - in this case you'll save $24 compares of scrHigh which will always match - so the trade-off is speed for extra ROM space.

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
nimbus396
Posts: 5
Joined: 12 Feb 2020

Re: Generic for loop

Post by nimbus396 »

Not really an assembler. I play around with ideas using https://skilldrick.github.io/easy6502/#intro. It is an online emulator written in JS with a step debugger and monitor. However, I can see now, doing the coding on the hardware would probably be better; the indirect addressing mode (<address>) is only available for JMPs in the emulator. but, (<address>,x) and (<address>),y are available. So, coding on the hardware will allow for optimizations. The finished skrilldrick code is below. Any suggestions on good books for optimizing algorithms with the 6502?

Code: Select all

define scrLow $00       ; Screen memory
define scrHigh $01

define lpLow $02        ; Loop Counter
define lpHigh $03

; Setup the screen start

lda #0                   ; Store 0 to screen low
sta scrLow              ; byte
lda #$02                ; Store $02 to screen high 
sta scrHigh             ; byte : screen start (512)

; Set the loop counter max value

lda #$24                ; Store $24 to loop counter low
sta lpLow               ; byte
lda #$05                ; Store $05 to loop counter high
sta lpHigh              ; byte : loop counter (1316) 
ldx #0

fill:      lda #3             ; set the color
           sta (scrLow,x)     ; put it on the screen
           inc scrLow         ; increment low byte
           bne skipIdx        ; skip if not zero
           inc scrHigh        ; increment high byte
skipIdx:   lda scrHigh        ; load high byte
           cmp $03            ; compare high byte
           bne fill           ; != so increment low byte
           lda scrLow         ; load low byte
           cmp $02            ; compare low byte
           bne fill           ; != so increment low byte
     
end:       brk
leepivonka
Posts: 168
Joined: 15 Apr 2016

Re: Generic for loop

Post by leepivonka »

Here is a version (untested) that is callable & takes advantage of the quirks of the 6502's registers & addressing modes.

Code: Select all

define Ptr $00  ; 2 bytes
define Temp1 $02  ; 2 bytes

;---------------------------------------------
; Main

        ; fill a section of screen memory
        lda #$40    ; Ptr=$240
        sta Ptr+0
        ldy #$2
        sty Ptr+1
        ldx #3      ; Len=$380
        ldy #$80
        lda #3      ; color value
        jsr Fill

        ; fill another section of screen memory
        lda #0      ; Ptr=$200
        sta Ptr+0
        ldy #2
        sty Ptr+1
        ldx #4      ; Len=$400
        ldy #0
        lda #1      ; color value
        jsr Fill

        brk

; -------------------------------------
Fill: ; Fill memory starting at Ptr for XY bytes with A.
    ; Values in Ptr & Temp1 & X & Y are destroyed.

        sty Temp1+0 ; save length lo byte
        ldy #0      ; init block index

        ; fill full 256byte blocks
        cpx #0      ; if no full blocks
        beq @22     ;    skip to the partial block code
@12:    sta (Ptr),y ; store
        iny         ; step to next block index
        bne @12     ; loop until block is done
        inc Ptr+1   ; step to next block
        dex
        bne @12     ; loop until all full blocks are done
        beq @22     ; branch always

        ; fill last partial 256byte block
@21:    sta (Ptr),y ; store
        iny         ; step to next block index
@22:    cpy Temp1+0 ; loop until block is done
        bne @21

        rts
User avatar
GARTHWILSON
Forum Moderator
Posts: 8775
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Generic for loop

Post by GARTHWILSON »

nimbus396, welcome.

Although the example you give is to fill a screen, what you wrote at the top of your head post seems to indicate you would like a general-purpose 16-bit FOR...NEXT loop. There's one in the set of my pages about program structures in assembly language with macros, at http://wilsonminesco.com/StructureMacros/#for16 which is about 80% of the way down the page. It allows you to do for example,

Code: Select all

        FOR  var1, 1, TO, 5000     ; Loop 5,000 times.
           <actions>
           <actions>
           <actions>
        NEXT  var1

(In the macro invocation line, the C32 assembler I use requires the commas between the input parameters, as I'm sure most assemblers do.) The macro source code is about 70% of the way down the page at http://wilsonminesco.com/StructureMacros/STRUCMAC.ASM . The code it will lay down is:

Code: Select all

          LDA   #<starting counter value low byte>
          STA   <variable name>
          LDA   #<starting counter value high byte>
          STA   <variable name> + 1

              <do your stuff here>

          INC   <variable name>
          BNE   nx1
          INC   <variable name> + 1

 nx1:     LDA  <variable name>
          CMP  #<FOR limit + 1, low byte>
          BNE  <top of loop>

          LDA  <variable name> + 1
          CMP  #<FOR limit + 1, high byte>
          BNE  <top of loop>
                                     ; If, after being incremented, the specified variable
                                     ; matches the limit + 1 (checking both bytes), drop through.

Note that X and Y are not used at all. A is used once at the beginning for the FOR setup and then used each time NEXT runs; but you can use it for whatever you want inside the loop as long as it's ok for it to get modified by NEXT.

Where it says BNE <top of loop>, the target address is of the first instruction after the setup's STA <variable name> + 1. Where it says "<variable name> + 1," that's the high byte of the variable you're using as a counter, whereas where it says "FOR limit + 1," that's a constant (as you might have guessed from the "#" immediate indicator in front of it). So the variable is loaded with LDA (could be ZP or absolute, your preference) and compared with CMP#. NEXT uses two pieces of information left by the FOR macro: the desired limit, and the address of the top of the loop to branch up to when the limit has not been reached yet.

All of the program flow-control structure macros I give in that section of the website are nestable except this 16-bit FOR...NEXT and the CASE structure. (They can be inside other program structures, and you can have other program structures inside them; but you cannot have one 16-bit FOR...NEXT loop inside another 16-bit FOR...NEXT loop or one CASE structure inside another CASE structure.)

If you ever decide to move up to the 65816, since it has 16-bit index registers, doing something 5,000 times as shown above becomes as efficient as the 6502 handles numbers under 256.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: Generic for loop

Post by BigEd »

Indeed, welcome! For Atari 2600 development you might like Steven Hugg's in-browser IDE which includes an emulator:
https://8bitworkshop.com/v3.5.0/?platfo ... %2Fhello.a
User avatar
commodorejohn
Posts: 299
Joined: 21 Jan 2016
Location: Placerville, CA
Contact:

Re: Generic for loop

Post by commodorejohn »

nimbus396 wrote:
I see I have a lot more studying to do on Flag usage. Flag states seem to be able to remove cycles if used correctly.
Absolutely. A lot of tweaking/optimizing in assembler is in finding ways to avoid doing anything you don't have to; since the 6502 already does some basic comparisons and sets flags accordingly on the results of most instructions, it's often possible to know a variety of things without ever having to use a CMP: was the result zero? Negative? If the last instruction was an INC/INX/INY and the Z flag is set, it means the value rolled over, so if it's part of a larger variable you know to carry over the increment to the next most significant byte...things like that. Furthermore, if you can arrange circumstances correctly (without making the rest of your code too obscure,) you can get this to be the case more often.
nimbus396
Posts: 5
Joined: 12 Feb 2020

Re: Generic for loop

Post by nimbus396 »

I appreciate all the support :)
User avatar
BillO
Posts: 1038
Joined: 12 Dec 2008
Location: Canada

Re: Generic for loop

Post by BillO »

Maybe there is something I don't get here, but at the risk of sounding dumb (when has that ever stopped me?)


Looking at this:

Code: Select all

fill:      lda #3             ; set the color
           sta (scrLow,x)     ; put it on the screen
           inc scrLow         ; increment low byte
           bne skipIdx        ; skip if not zero
           inc scrHigh        ; increment high byte
skipIdx:   lda scrHigh        ; load high byte
           cmp $03            ; compare high byte
           bne fill           ; != so increment low byte
           lda scrLow         ; load low byte
           cmp $02            ; compare low byte
           bne fill           ; != so increment low byte
     
end:       brk
x is never changed so why are you doing sta(scrLow,x)?

Also, is scrHigh is not incremented then why do the load and compare on it every time?

Would this not work more efficiently?

Code: Select all

fill:      lda #3             ; set the color
           sta (scrLow)       ; put it on the screen
           inc scrLow         ; increment low byte
           bne fill           ; loop if not zero
           inc scrHigh        ; increment high byte
           lda scrHigh        ; load high byte
           cmp $03            ; compare high byte
           bne fill           ; != so increment low byte
           lda scrLow         ; load low byte
           cmp $02            ; compare low byte
           bne fill           ; != so increment low byte
     
end:       brk
Last edited by BillO on Wed Feb 12, 2020 11:06 pm, edited 1 time in total.
Bill
sark02
Posts: 241
Joined: 10 Nov 2015

Re: Generic for loop

Post by sark02 »

I thought I'd jump in with an attempt. I tested this on the easy6502 you mentioned, and it appears to work.

The main feature of the code below is that the fill loop is:

Code: Select all

l1:	
    sta ($20),y
    iny
    bne l1
    inc $21
    dex
    bpl l1
XY bytes are written, even if Y != 0. In this case, the function adjusts the ($21,$20) start location backwards (256-Y) bytes, and advances Y that same amount so as to enter the loop with everything setup. The goal is to keep the performance critical loop minimal.

The adjustment code looks long and awkward; there must be a more elegant way to do it, but my priority was getting to the loop.

Code: Select all

; Fill XY bytes of memory with A, starting
; at ($21,$20)
; clobbers $22
fill:
	cpy #0
   beq l2
	pha
	tya
	eor #$ff
	sta $22
	inc $22
	lda $22
	tay
	sec
	lda $20
	sbc $22
	sta $20
	lda $21
	sbc #0
	sta $21
	pla
l1:
	sta ($20),y
	iny
	bne l1
	inc $21
l2:
	dex
	bpl l1
	rts
It's not a generic loop by any means.
Post Reply