6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 7:21 am

All times are UTC




Post new topic Reply to topic  [ 59 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sat Dec 09, 2023 3:13 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
NormalLuser wrote:
Though I now have to find a good way to deal with the 10 mandatory CRC bytes that SD cards barf out every 512 bytes. I count every 2 bytes read currently so I can do a 8 bit counter and then do a unrolled read of 10 bytes when it rolls over to zero.. That's bad enough.
I don't want to count every bit.... Ouch... I'll have to think on how to handle that....
For this I was wondering about connecting PB6 up to the SD clock signal, and putting Timer 2 into the mode where it counts pulses on PB6 - then after the right number of pulses (512*8 I guess) you'd get an interrupt, and you could make the interrupt handler discard the next 80 bits, and reset Timer 2 to count the next 512 bytes, before returning. This way your mainline code doesn't need to count the bits at all, and you can treat it as a true bitstream.

NormalLuser wrote:
The below code takes something between 37 and 43 cycles a pixel/byte to fill the screen and I must encode the skips on the edge of the screen.
I'm sure there must be a faster way to do this:
Yes there are some decent optimisations to be had.

First, the way you're incrementing Screen is a bit clumsy - consider these changes:
Code:
  ;I don't like the screen inc code....
  ;seems like room for improvement?
  LDA Screen              ; remove this
  CMP #$FF                ; remove this, we'll just increment it and see if it hits zero
  BEQ .P1IncTop           ; remove this
  INC Screen
  BRA .P1DONE             ; change to BNE
.P1IncTop:
  INC ScreenH
  LDA #$00                ; remove this
  STA Screen              ; remove this

Next, you're keeping the Y register zero, but incrementing your pointer - it is more efficient to leave the low byte of the pointer zero, and increment Y instead using INY. I won't show the code for that change but it'll save a fair few cycles.

Next, this:
Code:
  LDA ScreenH
  CMP #$40
  BEQ .P1RstTop
  BRA .P1DONE
.P1RstTop:
Here you're comparing against #$40, but as ScreenH is only incremented, it's sufficient to just test that one bit, we don't need an exact comparison. This bit is easily tested using the BIT instruction, which sets the overflow flag based on bit 6 of the memory location read. The way you're branching is a bit backwards too. So replace the above five lines with this:
Code:
    BIT ScreenH
    BVC P1DONE    ; branch if bit 6 was not set yet

Now this:
Code:
  LDA #$20
  STA ScreenH
As ScreenH was $40 previously, you could just "LSR ScreenH" here.

And again here you're doing the branches backwards:
Code:
  BEQ .RLEDone
  BRA .RLETop
.RLEDone:
Why not just use a single "BNE RLETop" on its own? This trick with inverting jumps is only necessary if the branch target is out of range.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 09, 2023 8:54 pm 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 45
gfoot wrote:
For this I was wondering about connecting PB6 up to the SD clock signal, and putting Timer 2 into the mode where it counts pulses on PB6


That is an awesome idea!
Unfortunately I need that timer for square wave music. Because I'm clocked at 5 mhz I need to use both timer 1 and timer 2 together to keep the sound frequency in the audible range.

When I finally do digitized music, either via a FIFO on the VIA or by interleaving the VGA and CPU clock allowing me to just bitbang digital music out a VIA pin or 4 (since VGA halts the CPU 72% of the time currently), I will look at using the pulse counter to speed up the SD read further.
I'm already getting so much of a better SD read speed than I thought. Thanks so much for all the help on that!

And yes, that code is indeed so clumsy. Thanks for the help!
I look forward to cleaning it up with your suggestions.
Some of the badness is a result of the iterative process I had in making it..
As well as the fact that 'The avoidance of premature optimization led to less than ideal outcomes. '

IE, I hacked together random junk that 'worked' and am now picking up the pieces.
:D

Oh btw, I really like that bit with the 'bit'!

Code:
   
    BIT ScreenH
    BVC P1DONE    ; branch if bit 6 was not set yet


I've never had a good reason to use BIT yet like this, and now I get to shave something like 8 CPU cycles a drawn pixel using it!
Thanks again!


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 10, 2023 12:27 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
NormalLuser wrote:
Some of the badness is a result of the iterative process I had in making it..
As well as the fact that 'The avoidance of premature optimization led to less than ideal outcomes. '

IE, I hacked together random junk that 'worked' and am now picking up the pieces.

It's called "technical debt", as over time you tend to end up "owing" more time to fix it than it would have costed if you'd fixed it sooner. The general idea is, it's fine to borrow, but you have to pay it back sooner rather than later to avoid being overwhelmed.

That's not to say you should optimise everything like crazy, but writing simpler code is always worth it, that much is never premature.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 10, 2023 8:53 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
NormalLuser wrote:
That is an awesome idea!
Unfortunately I need that timer for square wave music. Because I'm clocked at 5 mhz I need to use both timer 1 and timer 2 together to keep the sound frequency in the audible range.


Without having looked at your design... if your audio is out of normal hearing, is it possible to stick a divider on the audio output? Something like a *393 would give you eight octave outputs... (I'm kinda assuming that you are selecting a frequency and need one timer as a pre-divide.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 10, 2023 10:06 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Just a little surprised by "the 10 mandatory CRC bytes that SD cards barf out every 512 bytes" - CRCs are usually, and in this case, surely just two bytes? Or is it more that at the end of one sector you're counting overhead of accessing the next sector?

ref: http://www.rjhcoding.com/avrc-sd-interface-4.php


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 10, 2023 11:22 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Good point, I believe he is using CMD18 to read multiple blocks, according to http://elm-chan.org/docs/mmc/mmc_e.html each block is followed by a two-byte CRC and preceded by a data token ($FE). In between it looks like the card may idle (reading as $FF) but it might not always be the same duration for different cards, unless that is specified somewhere. So you ought to read the 2 byte CRC and then keep reading until you get $FE (or just until the next zero bit?) rather than always reading 10 bytes.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 10, 2023 6:20 pm 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 45
barnacle wrote:
Without having looked at your design... if your audio is out of normal hearing, is it possible to stick a divider on the audio output? Something like a *393 would give you eight octave outputs... (I'm kinda assuming that you are selecting a frequency and need one timer as a pre-divide.


Yea' I totally thought about using a filpflop or counter to divide that down, but at this stage I am working on a 'Stock' demo for the Ben Eater kits.
So right now I'm just using PB7 for audio since that is as simple as a resistor, capacitor and speaker.
Other than that, extra bypass capacitors and better ground/power distribution, the only hardware change from stock is 1 jumper wire from VGA Vsync to the NMI CPU pin, and 1 jumper from the 5 mhz VGA counter output to the CPU clock in.

For the moment I'm locked with this hardware setup.

There will be a 'phase 2' demo in the future where I do digital audio and increase performance by fiddling with the hardware a bit more.
As a FYI, I'm trying to push a (mostly) 'stock' Ben Eater setup just as far as I can to develop the software and workflow I need while also maybe contributing something for others getting into 6502/assembly.
My eventual long term goal is a 6502 based 'Arcade Board' PCB that will live inside a full size arcade cabinet running a game that I created.
Other than bitmapped graphics and a 6502 it may not have much else in common with my current build.
So any hardware suggestions will be appreciated and filed for longer term reference!


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 10, 2023 6:33 pm 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 45
gfoot wrote:
Good point, I believe he is using CMD18 to read multiple blocks, according to http://elm-chan.org/docs/mmc/mmc_e.html each block is followed by a two-byte CRC and preceded by a data token ($FE). In between it looks like the card may idle (reading as $FF) but it might not always be the same duration for different cards, unless that is specified somewhere. So you ought to read the 2 byte CRC and then keep reading until you get $FE (or just until the next zero bit?) rather than always reading 10 bytes.


I'm only working with Class 10 cards and they always toss out 10 bytes every 512 byte block.
I think.. And I am new at this, that Class 10 are forced at 512 byte block and a 32 bit CRC. Everyone says that if you turn off CRC it just ignores you and sends it anyway in multi-block transfer mode.
Not sure if that 32 bit CRC is transmitted with extra buffer bytes, or if it is ASCII HEX format or what the deal is?
All I know is that if I toss 10 bytes every 512 my data transfers are perfect.


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 11, 2023 12:32 am 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 45
gfoot wrote:
First, the way you're incrementing Screen is a bit clumsy - consider these changes:


Oh I considered them...

Code:
.TriDone:
  LDX #$00
  ldy #$00
.RLETop:
  DEC RLECount
  LDA PlotColor
  sta (Screen),y
  INC Screen
  BNE .P1DONE
.P1IncTop:
  INC ScreenH
  LDA ScreenH
  BVC .P1DONE
.P1RstTop:
  LSR ScreenH
.P1DONE:
  CPX RLECount
  BNE .RLETop
.RLEDone:
  dec Block_Counter
  BEQ .BLOCK
  JMP .readloop


That code looks so much better already!
I have not done proper benchmarking but visually I can tell that the slow frames are much smoother!
Thanks!

gfoot wrote:
Next, you're keeping the Y register zero, but incrementing your pointer - it is more efficient to leave the low byte of the pointer zero, and increment Y instead using INY. I won't show the code for that change but it'll save a fair few cycles.


I agree, but what would be a good way to do it? Have a Y position zero page value that I store at the end of the routine and load at the beginning? I still need to roll-over that high address..
I suppose if the overhead of that load and store is less than the savings from the INY?

Anyway, thanks again!


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 11, 2023 1:12 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
NormalLuser wrote:
That code looks so much better already!
I have not done proper benchmarking but visually I can tell that the slow frames are much smoother!
Thanks!

Great! A couple more points:
Code:
.TriDone:
  LDX #$00
  ldy #$00
.RLETop:
  DEC RLECount
  LDA PlotColor     ; you should be able to preload this before RLETop, as A isn't used elsewhere in the loop
  ...
.P1IncTop:
  INC ScreenH
  LDA ScreenH       ; this needs to be "BIT" not "LDA", LDA won't set/clear V, and it also clobbers A so that we can't preload it with PlotColor outside the loop
  BVC .P1DONE


Quote:
I agree, but what would be a good way to do it? Have a Y position zero page value that I store at the end of the routine and load at the beginning? I still need to roll-over that high address..
I suppose if the overhead of that load and store is less than the savings from the INY?

Yes, that's the idea. INY saves saves three cycles compared to INC <zp>, so so long as the loop executes more than three or four times it's worth paying the overhead of:

Code:
    LDY Screen
    STZ Screen
at the top and:
Code:
    STY Screen
at the bottom. But if your loop is often very short then it could be slower this way - however that in itself could be a sign that the loop isn't right for you.

Also note that with the 65C02 you can use STA (Screen) which is one cycle cheaper that STA (Screen),Y. This may actually be better overall, because then you can use Y for something else.

Moving DEC RLECount to where CPX RLECount is (and removing the CPX) will I think work fine and be cheaper. Even better, LDX RLECount at the start and DEX in the loop, for the same reason as above.

Another higher-level thought is that if you can synchronise the RLE lengths with the line endings then you won't need to check for Screen/ScreenH overflow between all pixels, only after an entire RLE string is completed. It will use more data where you're splitting large spans, but could make this loop much faster as the BIT/BVC wouldn't be needed. If having larger data is not viable, then you could instead compare the RLECount to $40-Screen before starting the loop, and make your loop cycle only up to the lower of these two values, using a spare register to count down through these instead of looping all the way until RLECount is zero. Now you can store long RLE spans, but still not have to deal with Screen/ScreenH wrapping inside the loop.


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 11, 2023 4:13 am 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 45
Thanks for the tips!
gfoot wrote:
Yes, that's the idea. INY saves saves three cycles compared to INC <zp>, so so long as the loop executes more than three or four times it's worth paying the overhead of

OK, fix that up also...

Code:
.TriDone:
  LDX #$00
  ldy Screen
  stz Screen
  LDA PlotColor
.RLETop:
  DEC RLECount
  sta (Screen),y
  INY
  BNE .P1DONE
.P1IncTop:
  INC ScreenH
  BIT ScreenH
  BVC .P1DONE
.P1RstTop:
  LSR ScreenH
.P1DONE:
  CPX RLECount
  BNE .RLETop
.RLEDone
  STY Screen
  dec Block_Counter
  BEQ .BLOCK
  JMP .readloop
 



For objective benchmarking I took off the Vsync's and got 41 FPS on average. A 10% or 4 FPS increase from 37 I had before encoding Vsync's on 'good' frames.

Subjectively the 'inversion' frames that switch black and white now fill noticeably quicker and the spiral sun and band zoom are much more fluid.
NICE!!!

Better though?
I am thinking that if I work on my other routines I can just 'share' y register and not do any of the loads and stores for y and just iny.
If so I think that will be good for another frame or two!
I bet these enhancements plus a 7 kb read buffer will get me pretty darn close to a silky smooth frame locked 30 FPS. And I still have performance gains from bit level encoding to tap!
Thanks a bunch!
:D


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 12, 2023 12:14 am 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 45
Now I'm up to 46.4 FPS!
gfoot wrote:
Also note that with the 65C02 you can use STA (Screen) which is one cycle cheaper that STA (Screen),Y. This may actually be better overall, because then you can use Y for something else.


Hrm... I am using a 65C02.. and I did NOT know about the STA(), but I'm not sure I have a better use for Y?

gfoot wrote:
.....
Another higher-level thought is that if you can synchronise the RLE lengths with the line endings then you won't need to check for Screen/ScreenH overflow between all pixels, only after an entire RLE string is completed. ....


Yea' this was great to have on my mind when I realized that my encoder is ALWAYS in skip mode on the edge of the screen.
So I was also able to remove any checks for that in the Tri-pixel and RLE draw routine and just do a INY instead.
Neat!

I also worked through preserving the y register and now that my code is cleaned up a bit more I'm getting 46.4 FPS when run without vsync!
(Did I mention.. 46.4 fps?)

:D

Now I'm working through this bit level encoding idea and with the stupid 10 CRC bytes from the SD card every 512 I need to do one of two things, either go ahead and count the darn little bits, or change the way I encode entirely to make it easier to keep track of the bits out of the SD card.

Right now an easy way to pack bits with little change to my existing code would be to simply use the first bit to indicate if it is a skip or a tripixel+RLE, if it is a skip I read 8 more bits and that is my RLE for a total of 9 bits for a skip instead of 16. If it is a TriPixel I read 6 bits for my 3 pixel 64 entry lookup table, then read 7 bits for my RLE. It is only 7 bits for RLE for this because I can't draw off-screen, so a RLE draw line is never more than 100 pixels currently. That plus the original 1 bit to decide on TriPixel or Skip puts me at 14 bits vrs 16.
There are savings to be had....
But I can't really think of a good way to do this without counting bits.

The other way would be to change my encoding. I'd need some way to keep the update messages 'balanced'


I'm thinking:

With the current encoding I need to encode:

Black
White
Light Grey
Dark Grey
Skip
Vsync/Frame

I can probably get rid of the Vsync/Frame packet. Since my decoder already rolls-over the screen on it's own I may as well use that to indicate a frame and read data into the buffer with whatever time there is until it is time to draw again.

But that would still leave me 5 things to encode.

So, I could do 3 bit to do the branch/lookup and then use only 5 bits to skip/RLE up to 32 pixels at a time.
It would make my 'packets' a nice little byte. I'm not sure what the tradeoff on the encode would be but it is possible that in the real world it would help. On the empty/low change frames it would cost some extra packets, but on the busy frames it might do nothing other than save cycles?
Now that I'm 'peeling the onion' that is bit level encoding I'm thinking differently as to how I would do this.

Right now the entire decoder (minus SD card stuff) looks like this:
Code:
  ;A has the color(skipping that now) tripixel lookup, or Skip Run token #64
  CMP #64
  BEQ .SkipRun ;it is 64, want to skip these.
.TriPixel:
;ok, lame, but just hack the beep in here for now?
;only costs a few seconds in playtime.. but can be better.
; check for beep on skips instead maybe?
    CMP #255 ;Beep token
    BEQ .GotoBeep

    TAX ;color/index to x
    LDA Array1-65,x
    sta (Screen),y
    INY
    LDA Array2-65,x
    sta (Screen),y
    INY
    LDA Array3-65,x
    sta (Screen),y
    STA PlotColor
.TriDone:
  LDX RLECount
  LDA PlotColor

.RLETop:
  sta (Screen),y
  INY
  DEX
  BNE .RLETop

.RLEDone
  dec Block_Counter
  BEQ .BLOCK
  JMP .readloop

.SkipRun:
  clc 
  TYA
  adc RLECount
  TAY
  lda ScreenH
  adc #$00
  sta ScreenH
  CMP #$40
  BEQ .sRstTop
 
  dec Block_Counter
  BEQ .BLOCK
  JMP .readloop
.sRstTop:
  LDA #$20
  STA ScreenH
 
  dec Block_Counter
  BEQ .BLOCK
  JMP .readloop



But another thing I could maybe do for a quick change is that I could take my existing code and copy my 192 byte lookup table to zero page?
Is it worth it?
I'm not sure.. I got this from here: https://www.nesdev.org/wiki/6502_cycle_times

Mnemonic Description IMP IMM ZP ZP,X ZP,Y ABS ABS,X ABS,Y
LDA LoaD Accumulator 2 3 4 4 4+ 4+ 6 5+
STA Store Accumulator 3 4 4 5 5 6 6


I think if I move the lookup to zeropage I save 2 cycles a lookup without any other changes?
Correct? Because I like free cycles!

Again, thanks for all the help!


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 12, 2023 1:19 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
NormalLuser wrote:
Right now an easy way to pack bits with little change to my existing code would be to simply use the first bit to indicate if it is a skip or a tripixel+RLE, if it is a skip I read 8 more bits and that is my RLE for a total of 9 bits for a skip instead of 16. If it is a TriPixel I read 6 bits for my 3 pixel 64 entry lookup table, then read 7 bits for my RLE. It is only 7 bits for RLE for this because I can't draw off-screen, so a RLE draw line is never more than 100 pixels currently. That plus the original 1 bit to decide on TriPixel or Skip puts me at 14 bits vrs 16.
There are savings to be had....
But I can't really think of a good way to do this without counting bits.

If you can find a way to squeeze in a "read and discard the next 80 bits" command then you can make the encoder just insert this in the right places, i.e. any time the next command it wants to emit doesn't fit within the next 512 byte boundary. It may be tricky though as you may need to waste more than 80 bits in some cases, so you may also need to be able to specify a number of bits to skip.

This command is rare so needs to be added to the encoding cheaply. Can I presume that a tripixel command with zero length is invalid? Perhaps that's a good way to do it. So if the run length is zero then the decoder could ignore the tripixel command, and read and discard bits instead.

How many bits? At least 80 I guess but possibly up to 14 bits more. You could add the 6 bit tripixel lookup index to 80 to get a total number of bits to skip, perhaps.

It doesn't matter if the code to skip the bits is slow, it doesn't run often. The gain from doing it this way is that code that does run often (reading a bit in general) doesn't need to take any overhead counting bits or bytes. So every bit you read will then be a little bit faster.

Quote:
I think if I move the lookup to zeropage I save 2 cycles a lookup without any other changes?
Correct? Because I like free cycles!

If it happens often then it's worth doing; if these lookups are rare then it's not really worth it. I think you were using Kowalski's simulator to profile the code which seems a good approach to check you're optimising the right things.

On that note you also only need to optimise scenes where the frame rate is low, and understand why that is - is it the reading, is it the decoding, is it the drawing, in fact is there just so much drawing to do on these frames that the cpu will never be able to do it faster? Is it worth using a different encoding just for problem frames? Or as I said before can you make the encoder simplify the deltas, accepting a slightly lossy render to keep the frame rate up even if a few pixels aren't updated until the next frame?


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 13, 2023 12:03 am 
Offline

Joined: Sun Sep 24, 2023 3:45 pm
Posts: 45
gfoot wrote:
NormalLuser wrote:
Right now an easy way to pack bits with little change to my existing code would be to simply use the first bit to indicate if it is a skip or a tripixel+RLE, if it is a skip I read 8 more bits and that is my RLE for a total of 9 bits for a skip instead of 16. If it is a TriPixel I read 6 bits for my 3 pixel 64 entry lookup table, then read 7 bits for my RLE. It is only 7 bits for RLE for this because I can't draw off-screen, so a RLE draw line is never more than 100 pixels currently. That plus the original 1 bit to decide on TriPixel or Skip puts me at 14 bits vrs 16.
There are savings to be had....
But I can't really think of a good way to do this without counting bits.

If you can find a way to squeeze in a "read and discard the next 80 bits" command then you can make the encoder just insert this in the right places, i.e. any time the next command it wants to emit doesn't fit within the next 512 byte boundary. It may be tricky though as you may need to waste more than 80 bits in some cases, so you may also need to be able to specify a number of bits to skip.


I'm thinking there is a gain if I just count the darn bits. It does not need to really be that much of a gain to be worth it. Already I am SO CLOSE to full 30FPS all the time.

gfoot wrote:
Quote:
I think if I move the lookup to zeropage I save 2 cycles a lookup without any other changes?
Correct? Because I like free cycles!

If it happens often then it's worth doing; if these lookups are rare then it's not really worth it. I think you were using Kowalski's simulator to profile the code which seems a good approach to check you're optimising the right things.


Strange thing happened on the way to the simulator.
Unless I'm missing something it looks like:

Code:
 LDA $00,x
;and
 LDA $9000,x


Both use 4 cycles as long as you don't cross a page boundary?
So no gain by moving the lookup table to zero page like I would have thought?

gfoot wrote:
On that note you also only need to optimise scenes where the frame rate is low, and understand why that is - is it the reading, is it the decoding, is it the drawing, in fact is there just so much drawing to do on these frames that the cpu will never be able to do it faster? Is it worth using a different encoding just for problem frames? Or as I said before can you make the encoder simplify the deltas, accepting a slightly lossy render to keep the frame rate up even if a few pixels aren't updated until the next frame?


Right now I am SO CLOSE. The throwing of the apple and the spiral sun are full speed now, and the ying yang spiral at the end seems pretty close to full speed as well. Only the girls with the wings sliding/rotating into view have any real slowdown anymore.

I think I might be able to get what I'm looking for with just the 7k read buffer now.
I'm very interested to see how well I can make a very simple one work.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 13, 2023 4:00 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
NormalLuser wrote:
Both use 4 cycles as long as you don't cross a page boundary?
'Fraid so!

Here's the play-by-play of CPU operations for LDA z-pg,x:
    Cycle 1: fetch the opcode
    Cycle 2: fetch the ADL (the low 8 bits of the operand)
    Cycle 3: add X + ADL
    Cycle 4: put the addition result on the address bus and do the read

... and for LDA abs,x with no page crossing it is:
    Cycle 1: fetch the opcode
    Cycle 2: fetch the ADL
    Cycle 3: add X + ADL while simultaneously fetching the ADH (the high 8 bits of the operand)
    Cycle 4: put the addition result on the address bus and do the read

Notice the second case involves two simultaneous operations in cycle 3; that's why the total cycle count may seem surprising.

For better detail than I've provided here, see Appendix A of the MOS MCS6500 Family Hardware Manual. Links to this manual can be found here.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 59 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: