Self Modifying Code

Dr Jefyll · Post by **Dr Jefyll** » Wed Oct 31, 2018 6:51 pm

MichaelM wrote:

[...]SMC in the same sense as possible in FORTH where the instruction stream can be physically changed?

Maybe I should quickly clarify. There's nothing about Forth that requires Self Modifying Code. In no way is it fundamental, or even commonplace. In the context of 6502 FIG Forth, my previous post explains SMC's one and only use case -- a shortcut at the assembly-language level to boost efficiency... which works very effectively, but is optional -- a choice the programmer made.

JimBoyd · Post by **JimBoyd** » Wed Oct 31, 2018 7:46 pm

MichaelM wrote:

I've following this discussion with some interest. I can't say that I have a strong opinion one way or another. Generally, I will fall on the side of avoiding SMC. Many of the reasons given here reinforce the idea that SMC should be avoided.

When I said I'd like to talk about creative uses of self modifying code, I meant useful as in improving the size/speed of the overall code/system.
SMC doesn't really need to be avoided as long as the programmer is careful. in the case of SCAN and SKIP, the scan/skip loop is exclusive to SCAN and SKIP. No other code alters it. SCAN sets it to scan for a given character and SKIP sets it to skip all leading occurrences of a given character. No code modification takes place while the loop is running. It doesn't incur much of a speed penalty ( since the extra code is outside the loop ) and it's also almost two dozen bytes smaller than if SCAN and SKIP each had their own loop, assuming that SKIP jumps into the end of SCAN right at the first DEX,. SMC saves 40 bytes versus the case where SCAN and SKIP are independent of each other.

JimBoyd · Post by **JimBoyd** » Wed Oct 31, 2018 7:54 pm

It is possible to use SMC in the case of SCAN and SKIP without labels, but some might find it a little unsettling.

Code: Select all

// SCAN
HEX
CODE SCAN  ( A1 L1 C -- A2 L2 )
   0= # LDA,
   AHEAD,   ( AH1 )
   -3 ALLOT   // JUST NEED CONTROL FLOW DATA
   BAD STA,
   3 # LDA,  SETUP JSR,
   AHEAD,   ( AH1 AH2 )
   BEGIN,   ( AH1 AH2 BEG )
      N 4 + INC,
      0= IF,  N 5 + INC,  THEN,
      N 2+ LDA,
      0= IF,  N 3 + DEC,  THEN,
      N 2+ DEC,
   CS-SWAP THEN,   // SECOND AHEAD JUMPS TO HERE
      N 2+ LDA,  N 3 + ORA,   ( AH1 BEG )
   0= NOT WHILE,              ( AH1 IFW BEG )
      N 4 + )Y LDA,  N EOR,  .A ASL,
   // THE ADDRESS OF THE FIRST AHEAD ( THE ADDRESS OF STA,)  GETS FIXED UP BY THE FOLLOWING LINE
   CS-ROT THEN,               ( IFW BEG )
   0= UNTIL,   // STORE DESIRED BRANCH HERE
   THEN,                      (  )
   DEX,  DEX,
   N 4 + LDA,  0 ,X STA,
   N 5 + LDA,  1 ,X STA,
   N 2+ LDA,  PHA,
   N 3 + LDA,
   PUSH JMP,  END-CODE

// SKIP
CODE SKIP  ( A1 L1 C -- A2 L2 )
   0= NOT # LDA,
   ' SCAN @ 3 + @ STA,
   ' SCAN @ 5 + JMP,  END-CODE

The stack comments with AH1 AH2 BEG etc. show what is on the control flow stack ( data stack ) after the control flow word executes.
As I said, some people might not be thrilled by the creative ( mis- ) use of the control flow structures.

JimBoyd · Post by **JimBoyd** » Wed Oct 31, 2018 8:10 pm

Dr Jefyll wrote:

Interesting that you wrote an assembler using Bill Ragsdale's syntax. The code in Bill's version is interesting to say the least! I remember there was one particular word -- UPMODE -- which seemed to demonstrate rather a lot (arguably too much) cleverness!

I first got started in Forth with the 64Forth cartridge for the Commodore 64. The C64 was the only computer I had at the time. 64Forth used the Ragsdale assembler. Later I got Blazin' Forth. It also used the Ragsdale assembler, or a variation thereof. I guess I just got comfortable with the syntax. My Forth also had a variation of the Ragsdale assembler. over time I added improvements such as changing this:

Code: Select all

HEX
: NOT   20 + ;

To this:

Code: Select all

HEX
: NOT   20 XOR ;

I also added range checking for the words that compile a branch and more flexible control structures.
Someday I would like to build my own single board computer based on the 65C02 or the 65816. When that happens, I would like to port my Forth to the SBC. I took a good look at Ragsdale's word M/CPU so I could figure out how to add more opcodes to the assembler. I decided it would be easier to write my own version of M/CPU. That way, I would know how to add new opcodes to support other members of the 6500 family.
I agree that UPMODE displayed too much cleverness. My assembler works just fine without it and my index table is ten bytes smaller than his.

whartung · Post by **whartung** » Thu Nov 01, 2018 12:00 am

MichaelM wrote:

At a lunch time discussion with a colleague, we came to the conclusion that the function code was pre-compiled, i.e. static. The operating data of the function was somehow dynamically constructed and at run-time linked to the relevant instruction stream. Can I assume that your definition of SMC is such that since the "functional" behavior is static, then the resulting dynamically constructed function is not SMC in the same sense as possible in FORTH where the instruction stream can be physically changed?

My definition of SMC is where you have a routine in memory that is physically changed during course of the program execution.

Simply, if I were to "disassemble" a piece of code at a specific address at the beginning of a program, and, later, disassembled that same address range again later, and they were different, then that's SMC (modulo the whole relocating of code by a loader or linker or anything like that).

JimBoyds example is he has a common piece of code that gets changed based on who invokes the code.

So (and I have not looked at your code in any detail), rather then having something like:

Code: Select all

DOIT:
    LDA FLAG
    BEQ $1
    JSR DOTHIS
    JMP $2
$1: JSR DOTHAT
$2: RTS

vs

Code: Select all

_DOIT:
    JSR _DOTHIS
    RTS

DOTHAT:
    LDA #<_DOTHAT
    STA _DOIT+1
    LDA #>_DOTHAT
    STA _DOIT+2
    JMP _DOIT

DOTHIS:
    LDA #<_DOTHIS
    STA _DOIT+1
    LDA #>_DOTHIS
    STA _DOIT+2
    JMP _DOIT

So, you can see the first routine relies of a FLAG to discern whether to DOTHIS or DOTHAT, whereas the other physically changes the _DOIT routine in memory.

THAT is SMC to me. My earliest attempts at 6502 used SMC for a block move routine, changing the LDA and STA addresses (I wasn't to versant in the indirect and index address modes at the time).

This is quite different from passing pointers to compiled code around to be invoked as is done with dynamic languages (or just function pointers in, say, C).

The two main complaints against SMC is ROMability, and clarity.

The Fig-Forth thing is actually done in zero page, so this aspect alone doesn't affect ROMability (I don't know if the stock FIG is ROMmable as is or not, but if it isn't, its not because of this). FIG actually puts the JMP code in to RAM, lying in wait for the address later, during cold boot.

If Fig didn't use the technique of populating the address of an indirect jump, it could have simply played games with stack and invoked RTS.

Clarity is the biggest thing. Most folks don't do it just because of that, no matter how well documented it is. It's most routinely done in copy protection logic (which, by definition, is designed to hinder clarity).

As a rule, I tend to not like code that doesn't do what the source code says it does. It's not a universal truth, it's just a guideline I tend to favor.

In Java, it's quite possible to see something like:

Code: Select all

public class X {
    public void thingI() {
        System.out.println("Hi there!");
    }

    public static void main(String args[]) {
        X x = new X();
        x.thing();
    }
}

And when you run that code, instead of getting "Hi there", you could get something else.

Shenanigans are involved, of course, but it's possible.

Debugging code like that is like Bugs Bunny playing "Those Endearing Young Charms" on a "piana".

https://www.youtube.com/watch?v=gUsJXwE73QU

JimBoyd · Post by **JimBoyd** » Thu Nov 01, 2018 8:36 pm

I'm surprised nobody noticed something about SKIP in my example of SMC.
This:

Code: Select all

// SKIP
CODE SKIP  ( A1 L1 C -- A2 L2 )
   0= NOT # LDA,   // LOAD OPCODE F0 ( BEQ )
   ' SCAN @ 3 + @ STA,
   ' SCAN @ 5 + JMP,  END-CODE

can be streamlined to this:

Code: Select all

CODE SKIP  ( A1 L1 C -- A2 L2 )
   0= NOT # LDA,   // LOAD OPCODE F0 ( BEQ )
   ' SCAN @ 2+ JMP,  END-CODE

whartung wrote:

As a rule, I tend to not like code that doesn't do what the source code says it does. It's not a universal truth, it's just a guideline I tend to favor.

In my example, SCAN does exactly what the source code says it does. Storing the type of branch in the code simply sets it back to doing what the source code says because SKIP changes it slightly and it's not too difficult to see what SKIP does.
Imagine if you will a hypothetical extension to the 6510 processor ( the one in the C64 ). An extra flag is added, the branch invert flag. When this flag is cleared all branches work normally. When this flag is set all branches operate in the opposite sense. With such a processor, SCAN would clear the branch invert flag. SKIP would set it then jump into SCAN to the instruction after the clear branch invert instruction. This version of SCAN and SKIP would not be SMC. My version of SCAN and SKIP uses SMC to accomplish the same code reuse on the 6510 that would be possible on the hypothetical extended 6510.
I'm not thinking about SMC that twists the code beyond all recognition, rather to allow better code reuse as with SCAN and SKIP or to work around limitations such as what NEXT does to work around the 6502 ( and 6510 ) not having a double indirect jump.
I realize that my version, with SMC, would not work in ROM. Here is a version I might use in a ROM based Forth for the 65C02 or even a cartridge for the C64 to keep the size down. It depends on how badly I would need to save memory ( to fit the system in ROM ).

Code: Select all

// SCAN
HEX
CODE SCAN  ( AD1 N1 C -- AD2 N2 )
   DEY,
   N 6 + STY,  0 # LDY,   // SETUP NEEDS Y TO BE ZERO
   3 # LDA,  SETUP JSR,
   AHEAD,
   BEGIN,
      N 4 + INC,
      0= IF,  N 5 + INC,  THEN,
      N 2+ LDA,
      0= IF,  N 3 + DEC,  THEN,
      N 2+ DEC,
   CS-SWAP THEN,
      N 2+ LDA,  N 3 + ORA,
   0= NOT WHILE,
      N 4 + )Y LDA,  N EOR,  .A ASL,
   0= NOT IF,  0FF # LDA,  THEN,   // BE SURE RESULT IN ACCUMULATOR IS 0 OR FF
   N 6 + EOR,   // SCAN INVERTS THE RESULT, SKIP DOES NOT
   0= NOT UNTIL,   // SO THE TEST IS REVERSED
   THEN,
   DEX,  DEX,
   N 4 + LDA,  0 ,X STA,
   N 5 + LDA,  1 ,X STA,
   N 2+ LDA,  PHA,
   N 3 + LDA,
   PUSH JMP,  END-CODE
CODE SKIP  ( A1 L1 C -- A2 L2 )
   // SKIP HAS NO PARAMETER FIELD. IT'S CFA POINTS ONE BYTE INTO SCAN
   // AVOIDING THE DEY, INSTRUCTION
   -2 ALLOT
   ' SCAN @ 1+ , END-CODE

Notice the extra clock cycles added to the loop.
I thought of another version to keep the size down that doesn't use SMC and shouldn't incur as many extra cycles in the loop as this one does, but the SMC version reads easier. This other version wasn't pretty. Trust me.

[Edit: I forgot to remove the screen number ( and comment line ) in this example the first time. Oops. Yes, my Forth system has source in blocks. ]

JimBoyd · Post by **JimBoyd** » Fri Nov 02, 2018 9:21 pm

Dr Jefyll wrote:

I apologize if I've drifted off-topic. This post has more to do with working around Ragsdale's structured conditionals than with self-modifying code per se.

It seems to me that a discussion of self modifying code in Forth or Forth assembly code would include discussing the tools used to produce the self modifying code and maybe ways to adapt or extend the tools to meet the challenge.

Cheers,
Jim

JimBoyd · Post by **JimBoyd** » Fri Nov 02, 2018 9:38 pm

BTW, there is a discussion of self modifying code in general, rather than Forth specific, in the Programming sub-forum here.

JimBoyd · Post by **JimBoyd** » Tue Nov 06, 2018 11:14 pm

GARTHWILSON wrote:

How 'bout a separate SMC stack? It wouldn't need much space.

Not a separate SMC stack, a general purpose auxiliary stack, the aux stack.
There could be words to move single values from the data stack to the aux stack and back.
analogous to >R and R> they could be called >A and A>.
There could also be words to move two cells between the data and aux stacks 2>A and 2A>.
Words to move control flow data from the control flow stack ( which on most implementations is probably just the data stack ) to the aux stack could be CS>A and A>CS.

JimBoyd · Post by **JimBoyd** » Wed Jan 16, 2019 9:51 pm

I don't think SMC is as useful with high level code. The only example I could think of involves a deferred word's 'vector', for lack of a better word, setting the deferred word to another vector.
Case in point: Fleet Forth's (ABORT") , the word compiled by ABORT" , executes the word WHERE . WHERE shows the location of the error ( or tries to ). If loading a block causes an error, WHERE will try to load that block to show the error, causing yet another error ( recursively!)
One solution was to try something like the following:

Code: Select all

DEFER (WHERE)
: SHOW.WHERE
   ['] NOOP IS (WHERE)
   // SHOW THE LOCATION OF THE ERROR
   //
   //
;
: WHERE
   (WHERE)
   ['] SHOW.WHERE IS (WHERE) ;

(WHERE) is set to a no-op by SHOW.WHERE . If an error occurs when SHOW.WHERE is running, WHERE gets executed, but does nothing more than reset (WHERE) back to SHOW.WHERE so it's ready for the next error.
It was actually easier to take care of this with a flag variable like so:

Code: Select all

VARIABLE WHERE?  TRUE WHERE? !
: WHERE
   WHERE? @
   IF
      WHERE? OFF
      // SHOW THE LOCATION OF THE ERROR
      //
      //
   THEN
   WHERE? ON ;

So I'm not sure how useful self modifying high level Forth is. Does anyone have another example of high level Forth self modifying code?

Cheers,
Jim

JimBoyd · Post by **JimBoyd** » Fri Mar 29, 2019 9:02 pm

What about manipulating the return stack to control program flow?
I was just reading Dynamically Structured Codes by M. L. Gassanenko. Does this count as self modifying code?

Dr Jefyll · Post by **Dr Jefyll** » Sat Mar 30, 2019 3:20 am

JimBoyd wrote:

What about manipulating the return stack to control program flow?

I can supply an example, and (unsurprisingly) it's rather odd. In FIG Forth, there's one particular word which contains the remarkable sequence R> DROP

The purpose is to unwind the Return Stack and thus exit a BEGIN AGAIN loop that's in progress one level higher.

As we know, the stuff between BEGIN and AGAIN would ordinarily keep happening forever. But in this case, somewhere between BEGIN and AGAIN a condition is tested, and eventually the mystery word is allowed to execute, causing top-of-R to be dropped. Because it's a colon definition, the mystery word concludes with SEMIS which of course invokes the un-nest sequence. And instead of un-nesting to the mystery word's caller (ie, the word with the BEGIN AGAIN loop), we un-nest to that word's caller.

You might wonder whether the BEGIN AGAIN shouldn't just be replaced with a BEGIN WHILE REPEAT but this particular situation doesn't seem amenable to properly structured conditionals, and the R> DROP is used as a GOTO of sorts! A similar need may arise in other situations; I don't suppose the FIG example is unique.

That concludes the summary. To be explicit I'll need to identify the situation, and that entails explaining some cleverness in a different department. The word containing R> DROP has a name one character long, and that character is an ascii Null -- $00. Since Null is unprintable, the FIG Glossary lists the word as X, and interested parties can find it under that pseudonym.

As part of Forth's startup, QUIT calls INTERPRET and INTERPRET is the word with the BEGIN AGAIN loop I mentioned. The loop uses -FIND to get fragments of input text one by one and either compile or execute them. When no more text is available, a fetch from the buffer yields only a null, which is the end-of-buffer marker. -FIND dutifully does a search attempting to find a word in the dictionary named null, and the search is successful! Unprintable/X/Null is executed, and it's a case of, "Scotty, beam me up!"

INTERPRET ceases to be real, and suddenly we're back in QUIT.

-- Jeff

( QUIT is in Screen 54 of the FIG Forth source. INTERPRET is in Screen 52, and X is in Screen 45.)

JimBoyd · Post by **JimBoyd** » Sun Mar 31, 2019 8:00 pm

Fleet Forth does something similar. When WORD parses the text stream , whether a block, the text input buffer, or a string, it places a counted string at here and appends a blank. when the text stream is exhausted, word paces a count of zero and no characters, it still appends a blank. The blank name is found and it is immediate. In Fleet Forth, the blank name has a code field that points to EXIT , but no body, making it an alias for EXIT . Why waste memory on a colon definition when an alias for EXIT will work?
What about the technique used by M. L. Gassanenko? What do you think of it?

JimBoyd · Post by **JimBoyd** » Fri Apr 05, 2019 9:27 pm

I don't want to get hung up on assembler control flow workarounds, but since SCAN and SKIP were already mentioned, here is what I think is the best solution since I added an Auxiliary stack to Fleet Forth.

Code: Select all

SCR# 38 
// SCAN
HEX
CODE SCAN  ( AD1 N1 C -- AD2 N2 )
   0= # LDA,   // LOAD D0 ( BNE )
   HERE 1+ >A
   BAD STA,
   3 # LDA,  SETUP JSR,
   AHEAD,
   BEGIN,
      N 4 + INC,
      0= IF,  N 5 + INC,  THEN,
      N 2+ LDA,
      0= IF,  N 3 + DEC,  THEN,
      N 2+ DEC,
   CS-SWAP THEN,
      N 2+ LDA,  N 3 + ORA,

SCR# 39 
// SCAN SKIP
   0= NOT WHILE,
      N 4 + )Y LDA,  N EOR,  .A ASL,
   HERE A> !
   0= UNTIL,
   THEN,
   DEX,  DEX,
   N 4 + LDA,  0 ,X STA,
   N 5 + LDA,  1 ,X STA,
   N 2+ LDA,  PHA,
   N 3 + LDA,
   PUSH JMP,  END-CODE
CODE SKIP  ( A1 L1 C -- A2 L2 )
   0= NOT # LDA,   // LOAD F0 (BEQ)
   ' SCAN @ 2+ JMP,  END-CODE

Given the way my INTERPRET is defined ( with EXECUTE called by INTERPRET rather some word called by INTERPRET ) if I squeeze the code enough to fit both >A and A> on the same source screen I wouldn't even need to use the Auxiliary stack. It just wouldn't be portable.

JimBoyd · Post by **JimBoyd** » Sun Jun 23, 2019 7:08 pm

Dr Jefyll wrote:

JimBoyd wrote:

What about manipulating the return stack to control program flow?

I can supply an example, and (unsurprisingly) it's rather odd. In FIG Forth, there's one particular word which contains the remarkable sequence R> DROP

The purpose is to unwind the Return Stack and thus exit a BEGIN AGAIN loop that's in progress one level higher.

Is it as odd as this example from M. L. Gassanenko's paper Dynamically Structured Codes ?

Code: Select all

    : ENTER >R ; \ ( tcf-addr -- ) call the threaded code fragment at tcf-addr
    : SUCC COMPILE R@ COMPILE ENTER ; IMMEDIATE
    : FAIL COMPILE R> COMPILE DROP COMPILE EXIT ; IMMEDIATE
    : 1-10 ( --> i --- i --> ) \ generate numbers from 1 to 10
        0 BEGIN       1+ DUP 11 <
           WHILE      SUCC \ call the continuation, of type ( i -- i )
           REPEAT
           DROP
           FAIL ; \ exit the code fragment that contains the continuation
    : //2 ( i --> i --- i --> i ) \ filter even numbers
           DUP 2 MOD 0=
           IF     SUCC \ call the continuation, of type ( i -- i );
                       \ (in the case of //2 we could just exit)
           THEN
           FAIL ; \ exit the code fragment that contains the continuation
    : .even1-10 ( -- ) 1-10 //2 DUP . ;

.even1-10 yields:

Code: Select all

    .even1-10 2 4 6 8 10    ok

Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code

Re: Self Modifying Code