6502.org • View topic - Compressing during the compilation process

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Compressing during the compilation process

Page 3 of 6

[ 79 posts ]

Go to page Previous 1, 2, 3, 4, 5, 6 Next

Previous topic | Next topic

Author

Message

Bregalad

Post subject: Re: Compressing during the compilation process

Posted: Sat Mar 22, 2014 9:28 am

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland

Quote:

F'rinstance, why does every routine with more than three arguments have to copy data to zero page? How about making the callee, not the caller, do the copying? Speed's not an issue at this point, I understand.
[...]

If you really wanted to get tricky you could write just one "copy-and-call" function along these lines.

I already do exactly this in a few places, however it only works for arguments which are constant in the caller.

Top

Brad R

Post subject: Re: Compressing during the compilation process

Posted: Sat Mar 22, 2014 6:15 pm

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91

It seems weird that I should be the person not advocating the use of Forth here, but if you have 90% of your application coded, it's the wrong time to switch languages.

That said, I agree with the others who have suggested you implement some kind of virtual machine, rather than trying to uncompress blocks of code on the fly. Depending on the interpreter scheme you use (e.g. byte tokens vs. threaded) you can get impressive space savings. I don't know much about Sweet16, and I don't know anything about your application, so I can't suggest what might be an appropriate VM for you to use.

Forth programmers at a certain level of skill start thinking in terms of developing an application-specific language. In your case, an application-specific virtual machine. Doing this effectively requires a thorough knowledge of the application (which only you have), and also a skill at "factoring" a problem into smaller, reusable units. (I have some brief commentary about factoring here.)

Remember also the 90/10 rule. 90% of the time is spent in 10% of the code. That code needs to be optimized; most of the rest of your code does not. As others have commented: in Forth you tend to write your entire application in Forth, and then look to see where the time is being consumed. Often rewriting a small number of routines in assembly language is all that's needed. This rule applies to any language, not just Forth. In your case, you'll need to look at your application and identify the non-critical part of the code, with the goal of converting that to your VM.

A final note -- although Forth is stack-oriented, threaded languages in general need not be. Somewhere around here I have a manual for a dead language called CONVERS, which used Forth-like threading and syntax, but used registers for parameter passing. This may or may not work for you; it depends on what functions you need in your VM.

_________________
Because there are never enough Forth implementations: http://www.camelforth.com

Top

Bregalad

Post subject: Re: Compressing during the compilation process

Posted: Sat Mar 22, 2014 8:28 pm

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland

I already spent 90% of the time on 90% of the code, optimizing everything to be as small as possible using various 6502 tricks.

I just never even thought of a VM.

Top

Brad R

Post subject: Re: Compressing during the compilation process

Posted: Sat Mar 22, 2014 9:00 pm

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91

Sorry, I meant the 90/10 rule for execution time: 90% of the CPU cycles are spent in 10% of the code.

There's another saying that 90% of the time is spent writing the first 90% of the code, and writing the last 10% of the code takes another 90% of the time.

_________________
Because there are never enough Forth implementations: http://www.camelforth.com

Top

Bregalad

Post subject: Re: Compressing during the compilation process

Posted: Sat Mar 22, 2014 10:33 pm

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland

Yeah, I was thinking you were mentionning that saying. I'm currently in the 90% of the time writing the last 10%.

And this is so true, I started my project in 2005, and the game engine was complete by 2007, but the game still isn't done now ^^

EDIT :

If I ever go as far as using a VM, I'd probably engine my own one instead of using a pre-existing one.

As far I can tell, there is the following "problems" with SWEET16 (not they are actual problems, but could be sources of inefficiencies) :

- No way to call 6502 code directly
- No way to call SWEET16 code directly from 6502, you'll have to either switch mode and call, or call and switch mode
- Use a separate stack instead of 6502 stack
- When switching to/from 6502 code, there is no correspondence between both sets of registers (I mean, it'd be easy to save/restore A, X and Y to/from SWEET16 registers instead, and that'd be a heck lot better !)
- No way to load a 8-bit constant in a register (this means you'll waste a $00 in ROM for the high byte)
- No shift operations. I use them so often, it's a major source of inefficiency in native 6502 when you have to do ASL/ROL or LSR/ROR in order to shift a 16-bit value, how could they leave them out ?
- No AND/OR/XOR operations. In several cases you could simulate them with ADD or SUB, but not always.
- 15 registers sounds a bit like overkill (although it's hard to tell before using them)
- Don't like the fact it tries to simulate the status flags of a processor. Why no higher level branches, like "branch if this is greater than this", or things like that

I don't know what would be better between altering SWEET16 to fix those issues or make something completely new. I sort of prefer the register machine (and usage of the real, 6502, stack) to the stack machine, although I can see how the stack machine can be very efficient, as it just uses as much memory as necessary.

EDIT (2) :

To give an example of the routines that eats up significan space, I post the routine for my game over screen (it is inlined somewhere in the main function, because since it was only called once I decided to inline it). It basically have to clear the screen, print several things on it, and give the player the option to "conitnue" or "exit" (in which case the game is mostly reseted).

What takes up so many space is the fact many simple but different operations and routine calls have to be done, and they can't be factored out easily in things like loops, tables or whathever.

I really don't know how a VM could help make a similar functionality smaller, but if there's ideas I'm all open.
(This is just an example of a routine that can grow very large, there's other similar and not-so similar stuff too)

Code:

.macro "GameOverScr"

   jsr LoadMainScrPal
   clc
   jsr ClrNamTbl

   ldx #$20      ;Begin 8 colums from the left border
   ldy #$68
   lda #$08
   jsr WriteBigPixelTiles
   .dw GameOverTxtData1   ;Location of the data

   ldx #$21
   ldy #$08
   lda #$08
   jsr WriteBigPixelTiles
   .dw GameOverTxtData2

   ldy #$14
   ldx #$22
   lda #$08
   jsr WriteTitleText

   ldy #$16
   ldx #$22
   lda #$68
   jsr WriteTitleText

   jsr ClrSprRam      ;Clear all sprites

   ldy #$22
   sty $2006      ;Write the KO lucia "sprite" to BG
   sty $2006
   ldx #$00
-   stx $2007
   inx
   cpx #$04
   bne -
   sty $2006
   lda #$42
   sta $2006
-   stx $2007
   inx
   cpx #$08
   bne -

   lda #$03
   jsr StartSong         ;Game over song
   jsr ResetScreen
   lda #$00         ;Y is used for continue/exit selection (continue by default)
   sta Temp
_game_ovr_scr_loop
   jsr WaitNMI
   lda #$aa
   sta StringBuffer      ;Attribute for wounded lucia icon (constant)
   sta StringBuffer+1
   lda #$05
   bit Temp
   bpl +
   lda #$50
+   sta StringBuffer+2      ;Attributes for continue/exit selection (variable)
   sta StringBuffer+3
   lda #$23
   ldx #$e0
   ldy #$04
   jsr PrintVRAMString
   .dw StringBuffer

   jsr ReadLockJoys
   lda JoyLocked
   and #$2c
   beq +            ;If select or up or down is pressed
   lda Temp
   eor #$80         ;Invert the selection
   sta Temp
   lda #$12
   jsr StartSong         ;Make a sound effect
   jmp _game_ovr_scr_loop

+   lda JoyLocked
   and #$10         ;Loop until start is pressed
   beq _game_ovr_scr_loop

   lda Temp
   bpl +
   jmp Reboot
+
.endm

Top

Bregalad

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 10:18 am

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland

Apparently, all sources I've read mention clearly that stack machines provides higher code density than anything else. Since code density is all I care about in this case, I'd probably have to build my own stack machine (despite the fact I don't like them, but who cares).

The main problem is that it makes it a bit tricky to intermix with normal 6502 code for inter-function call. However, I have an idea that could make it work in most cases. When a 6502 function is called from the stack machine mode, I could start with A = top of stack, X = 2nd byte and Y = 3rd byte, that way, even if not all 3 regs are used for parameter passing, they can be used. I could do a stack machine with an external carry flag (that would of course correspond to the native carry flag), as I use C for input and output passing extremely often.

I could have 2 versions of call, one which pushes the result of the call on the stack, and another which doesn't. (arguments are pushed by the caller, before the return address)

Another advantage is that, by using the 6502 stack, the virtual machine itself can use instructions pha and pla very often, making it small (so that's good). TSX and $100,X addressing will have to be used whenever anything which is not at the top of the stack is addressed unfortunately, so we have to make this rarely happen.

So I could come with an instruction set which looks like the following :

Code:

*** Stack operations ***
PUSH #8-bit constant (2 bytes)
PUSHW #16-bit constant (3 bytes)
DUP #4-bit constant (1-byte) : duplicate the nth byte of the stack at the top of the stack
DUPW #4-bit constant (1-byte) : same as above, but 2 consecutive bytes at a time
DROP #4-bit constant (1-byte) : drop n bytes from the top of the stack
AND, OR, XOR, ADD, SUB (1 byte) do this operation on the top 2 bytes of the stack, replacing operands with single-byte result
ASL, LSR #3-bit constant (1 byte) : Shifts the 8-bit top of stack by 1-7 positions
INC, DEC, INCW, DECW (1 byte)
ADDW, SUBW (1 byte) do a 16-bit + 16-bit = 16-bit result addition
ASLW, LSRW #4-bit constant (1 byte) : Shifts the 16-bit top of stack by 1-15 positions

*** Memory operations ***
LOAD (1-byte) : Load byte at address pointed by top of stack, push it
LOADINC (1-byte) : Same as above, and pointer is post-incremented by 1
STORE (1-byte) : Store byte at top of stack to pointer which is below it
STOREINC (1-byte) : Same as above, and pointer is post-incremented by 1
STOREPOP (1-byte) : Same as above, and pop the byte after storing it

*** Branches ***
CALL #adr (3 bytes) : Call another stack machine code (after pushing the "program counter")
CALL6502PROC #adr (3 bytes) : Call a normal 6502 procedure (the "program counter", then a fake address is pushed on the stack, so that when it does rts it jumps back into the VM)
CALL6502FUNC #adr (3 bytes) : Same as above, but a result (A) is pushed on the stack
RETPROC : Go back to calling code (automatically selects between stack machine or 6502)
RETFUNC : Same as above, but the return address is one byte downto the stack, and the top of stack "result" is pushed again for the callee to use
BCC #adr (2 bytes) : Branch if C=0
BCS #adr (2 bytes) : Branch if C=1
BZ #adr (2 bytes) : Branch if top-of-stack = 0
BNZ #adr (2 bytes) : Branch if top-of-stack is not 0
BPL #adr (2 bytes) : Branch if bit 7 of top-of-stack is clear
BMI #adr (2 bytes) : Branch if bit 7 of top-of-stack is set

Seems like it'd be nice enough. About 100 of 256 bytes are "used" for opcodes. However, I'm afraid decoding all this mess will make up a bloated VM which will kill all the bytes saved by replacing native 6502 code by VM code.
So I'd have to carefully think, and adapt the VM to the code written for it - if an opcode ends up used too rarely I can save bytes by removing its functionality, on the other hand, if I see some functionality is clearly missing I could add it.

Top

Dr Jefyll

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 1:33 pm

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada

Bregalad wrote:

I'd probably have to build my own stack machine (despite the fact I don't like them, but who cares).

The main problem is that it makes it a bit tricky to intermix with normal 6502 code for inter-function call.

White Flame's Acheron VM seems to solve your issues, and it's not a stack machine -- it has 16 honest-to-goodness registers. Virtual registers, but still.

I admit I've only been skimming this topic, but was there a reason mentioned why AcheronVM is not suitable?

Like Brad, I feel weird as a Forth-head to be advocating an alternative. But one of the benefits of Forth is its efficient threading, and Acheron VM also has extremely efficient threading. As for Forth's stack orientation, that's hardly a point I myself feel "religious" about. If the task at hand will mostly fit into the available registers then IMO stack orientation is actually inefficient by comparison (in terms of programmer productivity). If you have 16 reg's, chances are everything you need is already there. IOW, you don't have to explicitly bracket every operation with a Fetch and a Store (as in Forth). The VM does that for you, under the hood. So it's easier to code for (especially if you don't like stacks).

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html

Top

Brad R

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 2:02 pm

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91

Here's an example from your code which is well suited to a VM implementation, but not to a "naive" stack implementation:

Code:

   ldy #$14
   ldx #$22
   lda #$08
   jsr WriteTitleText

Now if this happens a lot, and you're always passing constant (not computed) parameters, you could assign a token to WriteTitleText and have it take inline parameters:

Code:

.db TKNWriteTitleText, $14, $22, $08

A simple Forth implementation, assuming token threading, would look like this:

Code:

   .db TKNcliteral,$14
   .db TKNcliteral,$22
   .db TKNcliteral,$08
   .db TKNWriteTitleText

where in this case I have assumed the operation "cliteral" which pushes the following byte on the stack (the analogous operation in most Forths is "literal" which pushes a word; the "c" prefix is typically to indicate a byte operation). As you can see, that's not a lot better than 6502 machine code.

So, a stack doesn't necessarily increase your code density. I suspect the sources you've read are speaking of the density of instruction encoding at the machine instruction level. Since a stack machine's instructions don't require any bits for register selection, 8 bits can encode 256 different instructions -- versus the 15+16 instructions, if I recall correctly, that Sweet16 encodes in a byte.

But if, as in this example, you can write an application-specific virtual machine, you can get even better code density. A quick metric: how many reusable machine-language subroutines (like WriteTitleText) do you have in your application?

_________________
Because there are never enough Forth implementations: http://www.camelforth.com

Top

Bregalad

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 3:15 pm

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland

Quote:

White Flame's Acheron VM seems to solve your issues [...] but was there a reason mentioned why AcheronVM is not suitable?

1) It's bloated. 1.6kb for the virtual machine itself. The code who takes too much space is 10kb of which only ~6kb doesn't fall under the "time critical" part, so in order to do significant size savings (> 20%), this 6kb of non-time-critical code should be reduced to less than 2kb which sounds impossible.
2) Was made only for CA65, which is very nice I don't deny it, but my project doesn't use CA65 and I don't feel like doing a total rewrite (because CA65 is higly incompatible with my assembler)

Quote:

But if, as in this example, you can write an application-specific virtual machine, you can get even better code density. A quick metric: how many reusable machine-language subroutines (like WriteTitleText) do you have in your application?

Well it doesn't get more application specific than this.
And I have about 90-100 different subroutines.

Top

barrym95838

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 3:17 pm

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA

Does the NES allow you to wedge into the BRK vector? If so, you could write a short token handler that would turn Brad's example into:

Code:

        ...
        brk
        .db TKNWriteTitleText, $14, $22, $08
        ...

Each token would have a jump-table entry, and the dispatch routine could pre-load the appropriate number of in-line post-bytes into registers based on the token value.

Mike

Top

White Flame

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 5:34 pm

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672

The actual byte-based dispatch of Acheron is very small, only a few dozen bytes and a vector dispatch table (link to post describing it).

The 1.6KB includes larger functions like wide multiply and divide, many addressing modes for complex loads & stores, and combinations of named register access and "prior" register access. All of these can be stripped out of the instruction set (it is dynamically defined), leaving a very tiny core that you can replace with custom instructions. However, it's easy enough to grab any raw instruction pointer & token dispatch function (here's a list) and bang in your own hard-coded custom set from scratch instead of relying on any external architecture. It doesn't sound like Sweet16 is powerful enough for what you want to do, and it is hard to modify, so creating custom operations seems to make the most sense.

I am a bit skeptical about the claims of stack machine density. There are a few different issues involved: Source code density, binary density, and conceptual density of instruction complexity. Forth certainly has great source code density, but as we see in some of the examples above, the raw threaded binary density isn't impressive in isolated micro situations. But regardless of the VM used, pointer and function parameter operations can almost always be shrunk down to a smaller repsentation compared to raw 6502 usage.

The major problem I see with stack-based programming is that while something like "reg1 = reg2 + reg3" can be encoded in a single instruction on a register machine, a stack-based machine requires individual instructions for manifesting every individual parameter, then one for the actual operation (get reg1, get reg2, add). This is large and contains many more instructions and much more dispatch than the more complex instructions in a register machine. It is when instructions can be chained together in a stack machine such that intermediate parameters end up in the "right place" without intervention that the density increases. However, many stack-based programs are chock full of DUP/SWAP/DROP/ROLL/etc bloating up the code stream in order to manage the location of intermediate values. Function call instructions themselves tend to be shorter in threaded forth because at that level there's no differences between built-ins and user functions, but again there's often more overhead in setting up parameters. (Now, I'm off to don my asbestos suit.

)

Stack-based VMs are often used because they're simpler to implement and analyze, and contain less limitations on complexity of processing than register machines, where you have to worry about the size and usage policy of your register bank. When the latter is in play, it is also a potential source of code density gain for stack machines.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor

Top

GARTHWILSON

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 7:42 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California

Bregalad, commenting on your post with the [code] section above:
You can of course pass parameters on the page-1 hardware stack (as has been done a lot) but there are certain problems with it that are solved by having a data stack in ZP (indexed by X) that's separate from the return stack in page 1. Take for example the matter of using the page-1 hardware stack for passing parameters to a subroutine, but then that routine passes parameters to another, and now there's a subroutine-return address on top putting the target data farther down the stack than expected, so you get incorrect results. The separate data stack in ZP avoids this, because the subroutine-return addresses don't go on it. ZP addressing is also more efficient of course, and there are more addressing modes for ZP. Outside of that, your list of suggested routines looks very much like a token-threaded Forth where all the most commonly used instructions are a single-byte token rather than a code-field address.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

Bregalad

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 9:31 pm

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland

Quote:

Does the NES allow you to wedge into the BRK vector?

Of course, as with all original 6502 systems, the BRK generates an IRQ (also the IRQ is not often used on NES, NMIs are typically used instead).

Quote:

I am a bit skeptical about the claims of stack machine density

You are probably correct to be sceptical. I guess it's like for (normal) compression, you can't tell if LZS or huffman will work better before trying both and tying your conclusions. That's why I wrote a tool the compresses using all (simple) algorithms that came to my mind, and write the results for each one of them ^^

So if I really wanted to know I'd have to implement both a stack virtual machine and register virtual machine, convert the (non-critical) 6502 code to both of them, and count bytes. That's the only way to have a definitive answer as to which one is best.

Also I think most of the claims about register machine being less dense is because they use 2 or 3 operand instructions, as in
ADD R1, R2, R3 (ARM or MIPS style)
or
ADD R1, R2 (AVR or Thumb style)

But what SWEET16 does (and what I could very well copy) is that it has an accumulator-register machine, with only 1 operand, so that instructions can be coded on a single byte, like :
ADD R1 (add R1 to R0 which happens to be A).

This also means a lot of additional register copying, so even if most opcodes can be coded on 1 byte instead of 2, it means more instructions. However I still belive it's a win.

A big problem of the register machine is that, after a call, you don't know what to expect in your regs. Even if it was a call to 6502 code, it could have itself called another VM code, and override the regs. Using a sliding register window like AcheronVM solves this, but is a source of bloat in RAM usage and VM itself.

Quote:

It doesn't sound like Sweet16 is powerful enough for what you want to do, and it is hard to modify, so creating custom operations seems to make the most sense.

It looks like turing-complete to me, but the lack of shifts and logical operations is a huge miss. I think chains of ASLs and LSRs makes an enormous amount of my source code ^^

Quote:

You can of course pass parameters on the page-1 hardware stack (as has been done a lot) but there are certain problems with it that are solved by having a data stack in ZP (indexed by X) that's separate from the return stack in page 1. Take for example the matter of using the page-1 hardware stack for passing parameters to a subroutine, but then that routine passes parameters to another, and now there's a subroutine-return address on top putting the target data farther down the stack than expected, so you get incorrect results.

You don't get incorrect results. The arguments passed are always before the return address on the stack, and when returning a value, a different return instruction is used, so that a value on the top of the stack is pop-ed and memorized, then the return address is pop-ed and stored, and the memorized value pushed again. A bit of overhead, sure, but a ZP secondary stack is overhead too.

Addressing page 1 stack is not less optimal than page 0 in terms of bytes (remember, wherever I care about speed I won't even use the VM in the 1st place, what I want is make the VM as tiny as possible). Addressing bottom elements is worse (you need an initial TSX, and then 3 bytes each time), but accessing the top element is best as it's single byte (PHA/PLA), and that's what's being done most of the time. Only several instructions access non-top-of-stack data. This has also the advantage of letting both X and Y free for misc. use within the virtual machine when data deep into the stack doesn't have to be accessed.

However, yes, multiple call/returns versions have to be handled, a possible source of bloat for the VM itself. Perhaps I could just alternate values for the S register, and have the best of both worlds ?

I think "stack machine" is a horrible name for a machine with 2 stacks. It should be called "stacks machine", then ? One more reason to not like them, you can get cheated on the # of stacks the machine uses.

Top

GARTHWILSON

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 10:19 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California

You are asserting a lot of things that show your inexperience with the stacks way of doing it. There's nothing wrong with being inexperienced with stack machines (we all start there); but your perspectives will change a lot after you get that experience.

I should mention another attraction to stacks: Interrupts, even re-entrant, can be handled with little or no overhead, because anything that needs to be saved already is, by virtue of the stack. Those things will not get accidentally overwritten.

Fewer variables are needed, as any that are used only temporarily are replaced with pseudo-variables on the data stack which simply cease to exist when they are no longer needed. The only overhead for the ZP stack is INX and DEX, and even that is eliminated in cases where a function leaves the stack depth unchanged, for example to take an address on the stack and replace it with the contents of the memory pointed to by that address, completely done on the 65816 this way:

Code:

       LDA  (0,X)
       STA  0,X

Edit, after I got my stacks treatise up: These things are discussed in the treatise on 6502 stacks (plural, not just the page-1 hardware stack), starting in chapter 4, "Virtual stacks and various ways to implement them."

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

teamtempest

Post subject: Re: Compressing during the compilation process

Posted: Thu Mar 27, 2014 11:16 pm

Joined: Sun Nov 08, 2009 1:56 am
Posts: 388
Location: Minnesota

Quote:

I think chains of ASLs and LSRs makes an enormous amount of my source code

So...if the things you want to shift are all in one page...you can make a subroutine which is slower than inline but re-useable (if a VM has them, then they can't be that time critical):

Code:

    [...]    ; four others here
lshiftb4:
    asl baseaddr,x
lshiftb3:
    asl baseaddr,x
lshiftb2:
    asl baseaddr,x
lshiftb1:
    asl baseaddr,x
    rts

    [...]    ; four others here
rshiftw4
    lsr baseaddr,x
    ror baseaddr+1,x
rshiftw3
    lsr baseaddr,x
    ror baseaddr+1,x
rshiftw2
    lsr baseaddr,x
    ror baseaddr+1,x
rshiftw1
    lsr baseadddr,x
    ror baseaddr+1,x
    rts

Preload the X-register with the offset of the memory address you want to shift, then call the entry point that corresponds to the number of shifts you want. Five byte overhead for each call, so may not make sense for anything less than five byte-size shifts or two word-size shifts. Could be a net win if you're doing lots of word-size shifts.

He said, not having seen your code.

Top

Page 3 of 6

[ 79 posts ]

Go to page Previous 1, 2, 3, 4, 5, 6 Next

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 6 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum