6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 9:35 am

All times are UTC




Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Self modifying code
PostPosted: Sun Mar 02, 2008 4:06 am 
Offline

Joined: Sat Sep 22, 2007 1:31 am
Posts: 24
I've been writing a ton of self modifying code for optimizations on my latest project. Probably the largest amount I've written in a single project. The combinations of LUTs and self modifying code really makes for some really fast code. I did a search and didn't find a topic specifically on this. So... what are some of the crazy examples you guys have done with self modifying code?

Here are a one from the recent project...

for a sample frequency divider:


Code:
 lda #$7f
  sta sm_lp_cntr+1
  lda #$02
  pha
  cly
  clx
__main_lp:
  clc
sm_src:
  lda $0000,x           ;soure/destination setup/written to before routine call           
sm_dst:
  sta $0000,y           ;destination address always starts off with LSB of 00

sm_freq_flt_cntr:       ;8bit floating point counter
  lda #$00
sm_freq_flt_add:        ;8bit float incrementor
  adc #$00              ;no need for CLC since CPY clears it
  sta sm_freq_flt_cntr+1
  bcc .skip
  inx
.skip
sm_freq_chr_add:        ;the whole # part of the incrementor
  inx                   ;either INX or NOP
  iny
sm_lp_cntr:
  cpy #$00              ;initialized as #$7f
  bcc sm_src
  txa
  clc
  adc sm_src+1
  sta sm_src+1
  bcc .skip
  inc sm_src+2
.skip
  clx
  pla
  dec a
  beq .out
  cmp #$01
  bcs .upper_7f
.last_eight
  pha
  lda #$08
  sta sm_lp_cntr+1
  cly
  inc sm_dst+1
  bcc .skip2
  inc sm_dst+2
.skip2
  bra __main_lp
.upper_7f
  pha
  stz sm_lp_cntr+1     ;don't clear Y
  bra __main_lp
.out


The routine uses a float point incrementor for the frequency scaler. Ranges from .01 to 1.99. The destination buffer is 262 bytes (matching 262 scanlines :wink: ). The float point value is taken form a 100hz increment look up table (frequency_request/100). Output ranges from 0khz to 32khz on a fixed 15.7khz output.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Mar 02, 2008 11:12 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
Why don't you explain what you're doing?

You can start wth how you get a 32khz output frequency
with a 15.7khz sample rate ;)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Mar 02, 2008 5:32 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
One possibility requires external hardware to work. If you use a square wave output, you can generate a 4.57kHz square wave, and take the 7th harmonic in an external filter and amplifier.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 03, 2008 2:21 am 
Offline

Joined: Sat Sep 22, 2007 1:31 am
Posts: 24
bogax wrote:
Why don't you explain what you're doing?

You can start wth how you get a 32khz output frequency
with a 15.7khz sample rate ;)


You can playback a sample at 32khz(31470hz) on a 15.7khz by skipping every other byte. It's like scaling a bitmap scanline on a fixed resolution display, but the ear really doesn't notice artifacts from non-interpolated sample/frequency scaling like the eye does for unfiltered graphic scaling.

I'm writing a 4 channel MOD player for 65x variant system. Knowing my frequency playback range for an instrument(sample) is 0 to 32khz, I only need to range from 0.01 to 1.99 as the incrementor for reading from the wave data into a temp buffer. From the tests I've done it sounds great and it doesn't sound "unfiltered". I'm not sure if Amigas "Paula" worked in a similar fashion or not.

For mixing the 4 buffers and software volume handling, I use more self modifying code and LUTs. Software volume handling is very fast. (sm always means for self modifying in a label).

Code:
sm_chan_src:
  ldx $0000,y
sm_chan_vol:
  lda $0000,x


All the amplitude values are precalculated per volume level. Since each volume entry in the LUT is 256 bytes, you only need to update the MSB for the table in the self modifying label of LDX. Volume changes are restricted to once every frame or 262 sample bytes. Volume ranges from 0 to 64, where 64 equals no change to the samples amplitude.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 03, 2008 3:25 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
My only self-modifying code has been two places in ITC Forth's NEXT . The piece below is for the 65816, and only the portion executed when there's no interrupt. (The NEXT executed to start servicing an interrupt is actually shorter.) Here the variable "IP" (instruction pointer) starts one byte after "preIP" and the variable "W" (word pointer) starts one byte after "preW". This portion of code is copied to direct page after boot-up, and the "1234" on two lines will get initialized to different numbers before the first time this is run.
Code:
preIP: LDA  1234    ; Get cell pointed to by interpretive pointer. (Code &
       STA  W       ; IP together eliminates a level of indirection.) Put
                    ; that in the word pointer (which points to CFA).
       LDA  IP      ; Contents must be kept anyway.  Then increment the
       INA          ; instruction pointer so it will be ready for next
       INA          ; one, either to come to next or be saved to return
       STA  IP      ; to after a secondary call. Faster than two INC_DP's.

preW:  JMP  (1234)  ; Finally, jump to the code pointed to by the word
                    ; pointer.  (Code & W together here eliminates a JMP,
                    ; an advantage of having NEXT in direct-page RAM.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 03, 2008 4:25 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
tomaitheous wrote:
You can playback a sample at 32khz(31470hz) on a 15.7khz by skipping every other byte. It's like scaling a bitmap scanline on a fixed resolution display, but the ear really doesn't notice artifacts from non-interpolated sample/frequency scaling like the eye does for unfiltered graphic scaling.


I see, so you're actually downsampling from 31.47 kHz to 15.7 kHz
(and not trying to output 32kHz)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 03, 2008 6:34 am 
Offline

Joined: Sat Sep 22, 2007 1:31 am
Posts: 24
bogax wrote:
I see, so you're actually downsampling from 31.47 kHz to 15.7 kHz
(and not trying to output 32kHz)


Correct- anything above 15.7khz is technically down sampled since you're losing some resolution the further/higher up from 15.7khz you go.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Mar 05, 2008 2:04 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
I've used self-modifying code from time to time. One example is at:

http://6502org.wikidot.com/software-658 ... ymove#toc1

Another example (not from actual code, though) can be found in the code that starts with AND LDA #$35 in this post:

viewtopic.php?p=3331#3331

I've found it most useful when you need short, adjustable delay (say 2 to 20 or so cycles). For example:

Code:
; Delay 2 to 10 cycles (before the JSR)
;
LOOP NOP      ; 2 cycles
     NOP      ; 2 cycles
     NOP      ; 2 cycles
     NOP      ; 2 cycles
     BMI L1   ; 2 cycles (modified to BPL for a 3 cycle delay)
L1   JSR SUB
     BIT IO_LOCATION
     BPL LOOP ; modified to BPL LOOP+0 to BPL LOOP+4 as needed
     RTS


There's a similar situation with fast video. START and END are pre-calculated (e.g. using a table lookup for the multiplication) as:

START = first_row * bytes_per_row + first_column
END = first_row * bytes_per_row + last_column

where bytes_per_row is a constant. Then the JMP LOOP is pre-modified to JMP LAST-number_of_rows*3 (assuming ROW_nn is an absolute address).

Code:
; Fill rectangle with A
;
     LDX START
     JMP MOD
LOOP STA ROW_99,X
     STA ROW_98,X
     STA ROW_97,X
; etc.
     STA ROW_02,X
     STA ROW_01,X
     STA ROW_00,X
LAST CPX END
     INX
     BCS DONE ; BCC can't reach STA ROW_99,X
     JMP LOOP ; modified
DONE


Another example, is a jump table with more than 128 addresses (e.g. an inner interpreter). On the 65C02, this can be done with:

Code:
   ASL
   TAX
   BCS L1
   JMP (TABLE,X)
L1 JMP (TABLE+256,X)


However, this works on both the NMOS 6502 and the 65C02, and (often more helpfully) doesn't overwrite X.

Code:
; TABLE must be page aligned
;
   ASL
   BCS L2
   STA L1+1
L1 JMP (TABLE)
L2 STA L3+1
L3 JMP (TABLE+256)


In a similar vein, the 65C02 can use:

Code:
GET LDA (PTR)
    INC PTR
    BNE L1
    INC PTR+1
L1  EOR #0    ; update N and Z flags
    RTS


On the NMOS 6502, rather than using (zp),Y and saving and restoring Y (I've also found it helpful at times to keep debugging code from using any of the zero page), I've used:

Code:
GET LDA $FFFF ; address is modified
    INC GET+1
    BNE L1
    INC GET+2
L1  EOR #0    ; update N and Z flags
    RTS


Likewise (65C02 version):

Code:
PUT STA (PTR)
    INC PTR
    BNE L1
    INC PTR+1
L1  RTS


and (self-modifying version):

Code:
PUT STA $FFFF ; address is modified
    INC PUT+1
    BNE L1
    INC PUT+2
L1  RTS


Using self-modifying code to save and restore registers when using the stack is inconvenient usually isn't much of a gain, but it can be done:

Code:
   PHA
   JSR SUB1
   STX L1+1
   JSR SUB2
   PLA
   JSR SUB3
L1 LDX #0   ; modified by the STX instruction above
   RTS


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Mar 05, 2008 6:36 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I personally never use self-modifying code, but I have used a related technology, that of dynamic code generation. The idea is I have one or more copies of code in the program that I use as a template, copy it somewhere I know will be safe, tweak it with the desired settings, and then call it. Alternatively, you can compile code based on some input, then execute that.

Caching the code generated can amortize the compilation/template costs.

I was planning on using this to implement the Kestrel's blitter, but decided against it for simplicity sake.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Mar 05, 2008 11:22 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
I have a routine on a disk somewhere that's does that sort of thing for multiplication (or maybe it was division), where the input would be 10 and the output would be something like:

Code:
ASL
STA TEMP
ASL
ASL
CLC
ADC TEMP


When the input was 3, the output would be:

Code:
STA TEMP
ASL
CLC
ADC TEMP


and so on.

I agree with making a distinction between that and self-modifying code, even though they are similar. Another related technique that I would also put in a separate category is what could be called pre-modifying code, one example of which is adjusting absolute addresses due to lack of a linker. This sort of thing is only done once, and could be considered a separate step from actually running the application.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Mar 06, 2008 1:22 am 
Offline

Joined: Tue Sep 24, 2002 4:56 pm
Posts: 50
Location: Essex, UK
The original reason I gave up on self-modifying code was the idea of then trying to unpick where code has gone wrong - if it's not what I originally wrote, I not only have to figure out why it failed, but also how it modified itself before it failed :(

Also (obviously?), ROM-based code _can't_ be self-modifying, although Acorn came up with a couple of mechanisms for BBC Micro and Master "Sideways ROM" code to relocate itself when being run from the RAM in the Acorn 6502 Second Processor, most of which centred around altering the absolute addresses internal to the ROM code as it copies itself (or is copied automatically, on some later machines) during initialisation of the Second Processor.

(If anyone wants me to look up specific details of these mechanisms, let me know - I'm sure I've still got the books, though not immediately to hand.)

--Martin

_________________
Martin Penny

.sig in beta - full release to follow.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Mar 06, 2008 5:05 am 
Offline

Joined: Sat Sep 22, 2007 1:31 am
Posts: 24
mdpenny wrote:
The original reason I gave up on self-modifying code was the idea of then trying to unpick where code has gone wrong - if it's not what I originally wrote, I not only have to figure out why it failed, but also how it modified itself before it failed :(


To be honest I see no difference between self-modifying and other bugs, from say something like a mistype for an operand or forgetting to load/save a register at some point.

I have other uses for it to. I do a lot of hacking on original software that runs in all ram mode and self-modifying code is a real necessity. There's no way I can know what/which locations in scratch pad/work ram are unused by the original routines. This especially a concern for zeropage area. With self-modifying code, I can gain the ability of indirect method of access without the use of ZP, not to mention being able to define variables inbetween code.

You can push ZP data to the stack and use "PHP,SEI....PLP" to prevent mishaps with interrupt code, but the additional over head translates into more size. Self-modifying code cuts down code size for some really tight spots/locations and prevent delaying of time sensitive interrupts. I find this is especially true when you're adding your own hook and it needs to detect/preserve the original routine under certain circumstances. But I'm rambling on...

kc5tja/dclxvi: I've used that technique a few times. Most for block transfer opcodes (Txx for 6280) as a flexible DMA routine. One game in particular that I was looking at would build out small routines into memory. I not sure of the exact reason the programmers took that route as they had plenty of memory for mostly redundant code. The game also modifies the PC on the stack before every RTI. Very unusual compared to almost all other softs on the system.

The processor I'm working with probably benefits slightly more from self-modifying code than a normal 65CS02. Although it runs at decent speed of 7.16mhz, it has one extra cycle added to instructions that have memory operand - whether it's ZP or absolute and also for branchs taken. Where LDA #$xx is 2 cycles and ASL A is 2 cycles, LDA ZP is 4 cycles, LDA $AABB is 5 cycles, and BNE is 2/4 cycles.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Mar 06, 2008 7:46 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
tomaitheous wrote:
kc5tja/dclxvi: I've used that technique a few times. Most for block transfer opcodes (Txx for 6280) as a flexible DMA routine. One game in particular that I was looking at would build out small routines into memory. I not sure of the exact reason the programmers took that route as they had plenty of memory for mostly redundant code. The game also modifies the PC on the stack before every RTI. Very unusual compared to almost all other softs on the system.


I'm not sure, but the cost of subroutines on the 6502/65816 processors is excessive. 12 cycles are consumed for every JSR/RTS pair; 14 for JSL/RTL. Considering that all "good" programming practices call for modular, structured, and even object oriented (which has recently been proven to be isomorphic with functional) programming, the expense incurred in invoking subroutines can be quite substantial.

Therefore, one approach to make things faster is to cache dynamically-generated sequences of code that tend to run inline, so that the overhead of JSR/JSL and RTS/RTL are eliminated.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 07, 2008 3:18 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
tomaitheous wrote:
To be honest I see no difference between self-modifying and other bugs, from say something like a mistype for an operand or forgetting to load/save a register at some point.


I agree. One thing I've done when writing self-modifying code is only make one change at a time from non-self-modifying to self-modifying, then test that change. Then, if it goes off into the weeds and pretty much clobbers everything in RAM (hey, it happens :)), I know where to look, and it's usually not too difficult to spot my error. On a related note, another principle I try to adhere to is don't get too clever too quickly. In other words, if it's really wild, start with something simpler and make small changes. Using those principles, self-modifying code doesn't seem (to me) to be significantly more difficult to debug than other code.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 07, 2008 10:45 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
I know it's off topic, but can you give a reference on that proof?

kc5tja wrote:
..., and even object oriented (which has recently been proven to be isomorphic with functional) programming, ...


thanks
André


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: