Delay N clocks

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Post Reply
blargg
Posts: 42
Joined: 30 Dec 2003
Contact:

Delay N clocks

Post by blargg »

Most of my assembly coding is part of reverse-engineering video game consoles, and a common need is to delay N clocks, where N is a constant or run-time value. I figured I'd share the 6502 routines I use, since I found them fun to write. This is the key routine:

Code: Select all

; Delays A+20 clocks (excluding JSR)
; Preserved: X, Y
delay_a_clocks:
        lsr a
        bcs @b0c        ; 2/3
@b0c:   lsr a
        bcs @b1s        ; 2/3
        lsr a
        bcs @b2s        ; 2/3
@b2c:   bne @ge8        ; 3
                        ; -1
@ret:   rts
@ge8:   sec             ; 2
        sbc #1          ; 2
        beq @ret        ; 3
                        ; -1
        nop             ; 2
@eights:
        bne :+          ; *3
:       sbc #1          ; *2
        bne @eights     ; *3
                        ; -1
        rts
        
@b1s:   lsr a           ; 2
        bcc @b2c        ; 3/2
        nop             ; 2
@b2s:   bcs @b2c        ; 3
It's somewhat obfuscated because I wanted to reduce the overhead (the +20) as much as possible. In addition to the above, there are routines that delay A*256+10 clocks and A*65536+10 clocks, allowing 16-bit and 24-bit run-time delays by simply calling the three routines with the low, mid, and high bytes of the delay. There will be an overhead of some constant number of clocks, but that's usually not a problem.

For constant delays, I use a macro (ca65 assembler) that selects between several strategies depending on the delay, which can be any expression evaluating to 2 to 16777216 (or zero). For delays of less than 28, it uses either an inline delay made up of short instructions, or a JSR to a bunch of NOPs followed by a return. For 28 and larger, it uses a call to delay_a_clocks, and optional calls to variants of the "delay A*256 clocks" and "delay A*65536 clocks" routines when the mid and high bytes are non-zero. The X and Y registers are preserved by all routines/macros, but A is not, since it's easy enough to save and restore it. At one point I had a version with the full 24-bit delay stored inline after the JSR, which shortened the delay calls a bit, but this made the delay code much more complex, so I went back to the simpler scheme I use now.

Here's the full commented source code I use: 6502_delay.asm
User avatar
dclxvi
Posts: 362
Joined: 11 Mar 2004

Post by dclxvi »

I've added two routines I had sitting around to the wiki at:

http://6502org.wikidot.com/software-delay

One routine delays 25+A cycles (includes the JSR and RTS); the other can be inlined (i.e. you don't need a JSR or RTS) and delays 15+A cycles (it would take 27+A cycles if a JSR and RTS were used). They're a little smaller too.
repose
Posts: 26
Joined: 20 Feb 2012
Location: America

Post by repose »

This does 14+A in the range A=(1,8) or 13 with A=0.

First a reference

Code: Select all

$C5 cmp zp
$C9 cmp #
$EA nop
The Code

Code: Select all

;A=1..8
*=$1000
clc
adc #$ff-8;A=8-A so result will be 7…0 in A
eor #$ff
sta corr+1 ;self-writing code, the bpl jump-address = A
corr bpl *+2 ;the jump to (A) dependent byte (13 cycles so far)
cmp #$c9 ;A=8->A=0->BPL +2
cmp #$c9 ;
cmp #$c9 ;
cmp $ea ;3 =9  (13+9=22 max delay)
A table of the code fragments by branch offset

Code: Select all

Start Address
$1000    $1001    $1002    $1003    $1004    $1005    $1006    $1007    $1008
-------- -------- -------- -------- -------- -------- -------- -------- --------
cmp #$c9 cmp #$c9 cmp #$c9 cmp #$c9 cmp #$c9 cmp #$c5 cmp $ea  nop
cmp #$c9 cmp #$c9 cmp #$c9 cmp #$c5 cmp $ea  nop      
cmp #$c9 cmp #$c5 cmp $ea  nop
cmp $ea  nop
-------- -------- -------- -------- -------- -------- -------- -------- --------
9        8        7        6        5        4        3        2        0
Cycles

Expanding the concept

Code: Select all

Range Size
1..2  12
1..4  14
1..6  16
1..8  18
Simply adjust from cmp $ea to cmp #$c9 cmp $ea etc.

There's a variation for 1..7 that's 15 bytes because you can use eor #7 directly.
repose
Posts: 26
Joined: 20 Feb 2012
Location: America

Re: Delay N clocks

Post by repose »

Apparently I did some more work on this. See
http://csdb.dk/forums/?roomid=11&topicid=65658

I made a long post in there that's a bit hard to understand, but basically I've worked out every approach to this problem I think.
Post Reply