6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 9:59 pm

All times are UTC




Post new topic Reply to topic  [ 81 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Sat Aug 13, 2022 10:38 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
plasmo wrote:
Oh, maybe this is the trick you are thinking of: the original graphic data needs to be reformatted to fit 3 lines per page so LDA ($b0),y does not cross page boundary.
Haha, nope, that's not it! What I had in mind are two possible ways to get around the excessive delay that results when conventional means are used to read bytes from memory and write them to the shift register.

And you've chosen to snoop the data bus. I gather that when the LDA ($b0),y reads from the $4000-$DFFF region, the data bus gets copied to the shift reg at the end of the same cycle that does the read. This eliminates the delay otherwise required for a subsequent STA to explicitly write to the shift reg. Nice!

The other trick I had in mind would be overkill for what you're presently doing, but it's a favorite of mine despite being somewhat gnarly... or maybe because it's somewhat gnarly! :lol: If anyone's interested, the so-called Cheap Video technique espoused by Don Lancaster back in the day allows a byte to be fetched and written to the shift reg once per clock cycle -- ie, eight times faster than even the snoop trick!! :shock: This makes it a great choice when the CPU is rather slow or the goals for video resolution are rather high. I've successfully implemented Cheap Video on three different machines, and I wrote about the technique here.

plasmo wrote:
the original graphic data needs to be reformatted to fit 3 lines per page so LDA ($b0),y does not cross page boundary. The reformatting is done to the original graphic data during the vertical retrace period.
I like how you use the vertical retrace period for this! But have you considered tricks that would manage page crossings rather than avoiding them? Maybe you just haven't gotten to this yet. Edit: or maybe the reformatting doesn't have enough of a downside for you to consider it a problem -- which is fine.

Anyway, the way my brain works I'd be thinking about eliminating the reformatting, and to do that it's necessary to deal with the effect of page crossings on timing. I'd wanna eliminate the single-cycle NOP, and "spend" that cycle in one of two ways:

- arrange for a Wait State that only occurs when there's no Page Crossing. I know that's not entirely straightforward, but I'm sure it's doable, and it would let you guarantee that it always takes 8 cycles to move each byte, thus ensuring consistent timing. Or, ...

- replace each LDA ($b0),y with a STA ($b0),y. STA using (ind),y mode always take six cycles, regardless of whether or not there's a page crossing. So, instead of generating conditional Wait States, the challenge instead will be to keep the /WE pin of the RAM high so it reads instead of writing. Also, you'd need to somehow avoid bus contention between the RAM and the CPU. Do those guys connect directly together, or does your CPLD act as a middleman? If they connect directly together then this idea is a dud! :roll:

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Sun Aug 14, 2022 6:17 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
Jeff,
You are correct, snooping the data bus is my way of eliminating the STA instruction.

I believe I read about "Cheap Video" technique from your previous postings. That's a very cool idea and I'll do a dedicated project to explore that concept with 25.175MHz 6502. CPLD has control over the RAM as well as providing necessary instruction to 6502 so "Cheap Video" is entirely do-able. With that approach I'll be able to shift out 8-bit data at 25.175MHz instead of 1 bit so I can do color instead of monochrome. The challenge is how to deal with much larger graphic memory requirement.

I did eliminated the need to reformat the image, but my solution is by brute force. I accounted for all possible ways that 80 byte block crosses page boundary. There are only 4 cases so I personalize each case with the correct combination of LDA instructions that do not cross page boundary and LDA instruction that do cross page boundary.

To explain further, the 4 cases that resulted in page boundary crossing are ZP pointer $b0,$b1 with address values of $xxc0, $xxd0, $xxe0, $xxf0. If $xxc0, the interrupt routine called has 63 LDA ($b0),y that do not cross page boundary and 17 LDA ($b0),y that do; if $xxd0, the interrupt routine called has 47 LDA ($b0),y that do not cross page boundary and 33 LDA ($b0),y that do; so on. This brute force approach resulted in a bigger program, 1.7K compare to 0.5K for program that reformatted the image.
Bill


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Wed Aug 17, 2022 4:40 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
I'm glad you like the Cheap Video exploit, Bill. I'll be curious to see what you come up with! In case you haven't already noticed, let me point out that each scan line has a bunch of NOP-like faked opcodes then a faked RTS, and the first two cycles of the RTS will continue to increment PC just as the NOPs have been incrementing PC. So, I'm pointing out that you can include the first two cycles of the RTS as part of the long string of data-fetches from RAM. Those 2 cycles can be used scan-wise, rather than just being unavoidable overhead.

Now, back to the snoop approach.

plasmo wrote:
I did eliminated the need to reformat the image, but my solution is by brute force.
Okay, cool. For some circumstances that'll be the best tradeoff, for sure -- fix it in software!

Still, I can't help fleshing out other options to see how they compare. Depending on circumstances, the advantages of a hardware solution might be significant. Besides smaller, simpler code, the idea I floated about a Wait State that only occurs when there's no Page Crossing would also give you freedom from any restrictions re the start address of the scan. You wouldn't be limited to $xxc0, $xxd0, $xxe0, $xxf0 (for example) in the LS byte; any of 256 values would be alright. But at first I was drawing a blank on how the actual circuit would work. However, after sleeping on the problem I think I've found a tidy solution, which FWIW I may as well share with everyone (and no offense taken if you don't bother with it, Bill).

Attachment:
snooptrick shifter.png
snooptrick shifter.png [ 3.46 KiB | Viewed 8801 times ]

Instead of an 8-bit shift register, a 9-bit shift register is used (implemented within the CPLD). It is clocked once per CPU cycle, and there are two possible alignments for a byte being loaded from the data bus into the shift reg. The "Early Load" path loads the byte one place further upstream than the "Late Load" path.

Every LDA ($b0),y will touch the $4000-$DFFF region for either one cycle or two. On the first of those cycles (and using using the Early Load path) we always load the shift-reg ... even though it's unknown at that time whether we have good data or garbage. We find out on the following cycle. If there was a page crossing then the CPU will be re-running the access, and we can detect that based on the $4000-$DFFF address or the fact that the SYNC pin is still low. If the access is being re-run then once again we load the shift-reg, this time using the Late Load path. But if the access is not being re-run then the LDA ($b0),y is complete and the CPU will be fetching the next opcode. We can leave the shift reg alone (don't load; simply let it shift), but we need to give the CPU a Wait State, and this is easily done. I think this covers it... :) :roll:

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Thu Aug 18, 2022 3:43 am 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
Jeff,
I also like a hardware solution to compensate for crossing page boundary. However, I'm stuggling to figure out how to implement your scheme. The pixel shift register needs to shift out 8 valid data bit for all 80 bytes of a horizontal scan. It has no flexibility of discarding a data bit because of extra cycle due to page crossing. I also like the idea of looking ahead to SYNC signal for clue of page boundary crossing, but it is too late to insert a wait state to previous instruction once SYNC signal is observed.
Bill


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Thu Aug 18, 2022 1:47 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
I'm struggling, too, Bill -- with the explanation, at least... My previous post was kinduva mess, so I went back and made some edits, particularly to the final paragraph -- dunno if you noticed the (hopefully) improved version. :roll:

And today I'm including a chart of what happens each cycle, with and without a page crossing. It should be clear that these two sequences end up in the same place even though they each take a different route to get there!

It's also true that the data on the shift register output needs to end up in the same place, timing-wise, despite traveling by different routes. As a reference point, the big asterisk shows when the first of the fetched bits actually appears at the shift register output.

In the case on the left there is a 2-cycle delay from the "Fetch the DATA" cycle to the asterisk cycle. Check me on this, but I reckon it's consistent with what I said about aligning the shift-reg load one place further upstream (ie, via the Early Load path).

In the case on the right there is only a 1-cycle delay from the "RE-Fetch the DATA" cycle to the asterisk cycle. The shift reg actually gets loaded twice, but of course the results of the first load get overwritten by the second load (which puts the byte further downstream, via the Late Load path).

Is this making sense yet? I admit I had an "oh, crap" moment while writing this post... (ie, is this idea fundamentally misconceived??) :lol: But mostly I do think I'm on solid ground, and with luck I've now managed to make it clear why there's never any discarded data bit and also no need to insert a wait state into a previous instruction.

-- Jeff

[Edits, including better notations in the chart.]
[Edit: the chart again]


Attachments:
snooptrick sequence .png
snooptrick sequence .png [ 6.69 KiB | Viewed 8686 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Fri Aug 19, 2022 3:00 am 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
Dr Jefyll wrote:

Is this making sense yet? I admit I had an "oh, crap" moment while writing this post... (ie, is this idea fundamentally misconceived??) :lol:

I know about that feeling...very well. This is why I designed with PROGEAMMABLE logic and taught myself how to modify existing circuitry. :oops:

I see one wait state is inserted when address above $4000 was encountered in the previous cycle and current cycle has SYNC asserted. OK, that's easy to do. I'm still having problem with fetching the data (presumably from memory above $4000) and re-fetch the data again when page boundary is crossed. I understand this is due to page boundary crossing where calculation for the real address requires an extra cycle so there is a dummy cycle as a filler. This dummy cycle, if addressed to memory above $4000, will wreck havoc on my current software because it will cause two loads into the shift register when only one is expected. I would see significant data corruption, one pixel for every 8 pixel, in the part of the image scan where page crossing occurred, but I don't see that. I instrumented with scope of the shift register's LOAD/SHIFT input (which is A15 OR A14) and I see a 40nS pulse (one period of 25MHz clock) every time. The dummy cycle is NOT addressing to memory above $4000.

I'm using WDC W65C02, maybe it has different dummy cycle characteristics?
Bill


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Fri Aug 19, 2022 4:18 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
I'm a little confused, because your post seems to start off talking about my hardware wait-state idea but later seems to be talking about your existing (mostly-)software approach. But it's late, and I know at least one of us is getting sleepy.

Quote:
I'm using WDC W65C02, maybe it has different dummy cycle characteristics?
Different than other 'C02s? No, the 'C02 dummy cycles from WDC are the same as those for a 'C02 from Rockwell, etc. But 'C02 dummy cycles do differ from those of the NMOS '02.

On the NMOS '02, when indexing results in a page crossing, the first attempt at accessing the data will fail to account for the carry into the highbyte of the address. As a result, the highbyte will be too low by 1 (ie; the address as a whole will be too low by $100).

This wonky address during the dummy cycle had potential to cause trouble (such as by inadvertently touching IO), so when the 'C02 was introduced it included a remedy. The "too low by $100" address still occurs, but only internally within the chip. Externally, the dummy cycle accesses the last instruction byte instead (ie, it puts PC on the address bus), and that's sure to be safe.

I think this explains why you only see one 40nS pulse every time.

To be honest, I had somehow forgotten about the 'C02 fix until now. :oops: I apologize for incorrectly telling you (upthread) that "Every LDA ($b0),y will touch the $4000-$DFFF region for either one cycle or two." Although two cycles may be consumed, only one will touch the $4000-$DFFF region.

(BTW and FWIW, the 'C02-style protective remedy was removed when the 65C816 came along. Instead, the '816 introduced the VDA pin, an output which will be low during this and other dummy cycles, and if you have read-sensitive IO you want to protect then access to it can be qualified with VDA.)

-- Jeff


Attachments:
NCR65c02 invalid opcodes etc .png
NCR65c02 invalid opcodes etc .png [ 274.3 KiB | Viewed 8678 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Fri Aug 19, 2022 6:40 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8144
Location: Midwestern USA
Dr Jefyll wrote:
This wonky address during the dummy cycle had potential to cause trouble (such as by inadvertently touching IO), so when the 'C02 was introduced it included a remedy. The "too low by $100" address still occurs, but only internally within the chip. Externally, the dummy cycle accesses the last instruction byte instead (ie, it puts PC on the address bus), and that's sure to be safe.

Something to note is while the dummy cycle wonky address problem was corrected in the C02, a new bug was inadvertently introduced, at least in the WDC 65C02s. If indexing does not cross a page boundary, the base address will be touched (read), followed by access to the effective address (i.e., base + .X or base + .Y). That undocumented access has the potential to wreak havoc if using indexed addressing to access I/O chip registers. There’s a topic about it somewhere, but I’m having trouble recalling where. I do recall floobydust was the one who got tripped up by it.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Fri Aug 19, 2022 1:17 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
BigDumbDinosaur wrote:
Dr Jefyll wrote:
This wonky address during the dummy cycle had potential to cause trouble (such as by inadvertently touching IO), so when the 'C02 was introduced it included a remedy. The "too low by $100" address still occurs, but only internally within the chip. Externally, the dummy cycle accesses the last instruction byte instead (ie, it puts PC on the address bus), and that's sure to be safe.

Something to note is while the dummy cycle wonky address problem was corrected in the C02, a new bug was inadvertently introduced, at least in the WDC 65C02s. If indexing does not cross a page boundary, the base address will be touched (read), followed by access to the effective address (i.e., base + .X or base + .Y). That undocumented access has the potential to wreak havoc if using indexed addressing to access I/O chip registers. There’s a topic about it somewhere, but I’m having trouble recalling where. I do recall floobydust was the one who got tripped up by it.


Yes, I recall that as well... that trip was experienced when doing the initial BIOS to get the SCC2691 UART to work. Here's a link:

viewtopic.php?f=2&t=4992&p=57452&hilit=2691+bios#p57443

I've also detailed this whole problem in the BIOS source code as comments in the section that initializes the UART (and now DUART with the SC28L92).

Code:
;**************************************************************************************************
;Initializing the SC28L92 DUART as a Console.
;An anomaly in the W65C02 processor requires a different approach in programming the SC28L92
; for proper setup/operation. The SC28L92 uses three Mode Registers which are accessed at the same
; register in sequence. There is a command that Resets the Mode Register pointer (to MR0) that is
; issued first. Then MR0/1/2 are loaded in sequence. The problem with the W65C02 is a false read of
; the register when using indexed addressing (i.e., STA UART_REGISTER,X). This results in the Mode
; Register pointer being moved to the next register, so the write to next MRx never happens. While
; the indexed list works fine for all other register functions/commands, the loading of the
; Mode Registers need to be handled separately.
;
;NOTE: the W65C02 will function normally "if" a page boundary is crossed as part of the STA
; (i.e., STA $FDFF,X) where the value of the X Register is high enough to cross the page boundary.
; Programming in this manner would be confusing and require modification if the base I/O address
; is changed for a different hardware I/O map.
;
;There are two routines called to setup the 28L92 DUART:
;
;The first routine is a RESET of the DUART.
; It issues the following sequence of commands:
;  1- Reset Break Change Interrupts
;  2- Reset Receivers
;  3- Reset Transmitters
;  4- Reset All errors
;
;The second routine initializes the 28L92 DUART for operation. It uses two tables of data; one for
; the register offset and the other for the register data. The table for register offsets is
; maintained in ROM. The table for register data is copied to page $03, making it soft data. If
; needed, operating parameters can be altered and the DUART re-initialized via the ROM routine.
;
; Note: A hardware reset will reset the SC28L92 and the default ROM config will be initialized.
; Also note that the Panic routine invoked by a NMI trigger will also reset the DUART to the
; default ROM config.
;
INIT_IO         JSR     RESET_28L92     ;Reset of SC28L92 DUART (both channels) (6)
                LDA     #DF_TICKS       ;Get divider for jiffy clock (100x10ms = 1 second) (2)
                STA     TICKS           ;Preload TICK count (3)
;
;This routine sets the initial operating mode of the DUART
;
INIT_28L92      SEI                     ;Disable interrupts (2)
;
                LDX     #INIT_DUART_E-INIT_DUART ;Get the Init byte count (2)
28L92_INT       LDA     LOAD_28L92-1,X  ;Get Data for 28L92 Register (4)
                LDY     INIT_OFFSET-1,X ;Get Offset for 28L92 Register (4)
                STA     SC28L92_BASE,Y  ;Store Data to selected register (5)
                DEX                     ;Decrement count (2)
                BNE     28L92_INT       ;Loop back until all registers are loaded (2/3)
;
; Mode Registers are NOT reset to MR0 by above INIT_28L92!
; The following resets the MR pointers for both channels, then sets the MR registers
; for each channel. Note: the MR index is incremented to the next location after the write.
; NOTE: These writes can NOT be done via indexed addressing modes!
;
                LDA     #%10110000      ;Get mask for MR0 Reset (2)
                STA     UART_COMMAND_A  ;Reset Pointer for Port A (4)
                STA     UART_COMMAND_B  ;Reset Pointer for Port B (4)
;
                LDX     #$03            ;Set index for 3 bytes to xfer (2)
MR_LD_LP        LDA     LOAD_28L92+15,X ;Get MR data for Port A (4)
                STA     UART_MODEREG_A  ;Send to 28L92 Port A (4)
                LDA     LOAD_28L92+18,X ;Get MR data for Port B (4)
                STA     UART_MODEREG_B  ;Send to 28L92 Port B (4)
                DEX                     ;Decrement index to next data (2)
                BNE     MR_LD_LP        ;Branch back till done (2/3)
;
                CLI                     ;Enable interrupts (2)
;
; Start Counter/Timer
;
                LDA     UART_START_CNT  ;Read register to start counter/timer (4)
                RTS                     ;Return to caller (6)
;
; This routine does a Reset of the SC28L92
;
RESET_28L92     LDX     #UART_RDATAE-UART_RDATA1 ;Get the Reset commands byte count (2)
UART_RES1       LDA     UART_RDATA1-1,X ;Get Reset commands (4)
                STA     UART_COMMAND_A  ;Send to UART A CR (4)
                STA     UART_COMMAND_B  ;Send to UART B CR (4)
                DEX                     ;Decrement the command list index (2)
                BNE     UART_RES1       ;Loop back until all are sent (2/3)
                RTS                     ;Return to caller (6)
;
;**************************************************************************************************

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Fri Aug 19, 2022 2:19 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
BigDumbDinosaur wrote:
Something to note is while the dummy cycle wonky address problem was corrected in the C02, a new bug was inadvertently introduced, at least in the WDC 65C02s.
Thanks, BDD and floobydust. Yes there's a bug, and it applies to all 'C02s (not just WDC's). But, to clarify slightly, it's not a new bug that was introduced. The NMOS 6502 had it, too. But it's somewhat of a special case, and it "fell through the cracks" (although only partially) as far as the 'C02 fix is concerned.

( Warning! Descent into nerd-dom ahead! :wink: )

Ordinarily speaking, indexing will never result in an extra (aka dead) cycle unless a page crossing occurs. Ie, no page crossing means no extra cycle. Certainly this is true for all indexed reads (such as LDA, CMP, ADC etc). And the 'C02 fix does apply to indexed reads.

But STA abs,X and STA abs,Y -- which of course are writes -- are very strange animals because the extra cycle always occurs. You'll get a read followed by a write... and it's the read that can prove problematic.

If there's a page crossing then the 'C02 fix does kick in, and the read (harmlessly) uses PC as the address; then there's a write to the correct address. But if there's not a page crossing then the fix doesn't kick in. What you get is a read of the correct address followed by a write to the correct address. :shock: Some IO devices will tolerate this, but others won't. Somewhat counter-intuitively, you can work around the issue if you can somehow ensure (rather than prevent) page crossings when using STA abs,X and STA abs,Y to address a read-sensitive device! And that's the approach used in Kevin's code.

This and other dead-cycle phenomena are described in a detailed document published by Drass and me in 2018. Line 3 and footnote 8 have relevance to the topic at hand.

-- Jeff

Quote:
3. abs,X/Y — write w/o pg. crossing [8]
Quote:
[8] abs,X/Y write operations without a page-crossing actually read from the Fully Formed Address before writing to it. The read can be troublesome when accessing I/O devices where reads are destructive. For 65C02 there's a software workaround, which is to ensure that the write to abs,X/Y triggers a page-crossing, which means the address during the dead cycle will be a PBA, not the Fully Formed Address. The only software workaround that works on both NMOS and CMOS 6502 is to avoid abs,X/Y address mode when writing to the read-sensitive device.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Fri Aug 19, 2022 7:50 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
Dr Jefyll wrote:
I'm a little confused, because your post seems to start off talking about my hardware wait-state idea but later seems to be talking about your existing (mostly-)software approach. But it's late, and I know at least one of us is getting sleepy.

Jeff,
I did go off on a tangent when I looked at your "snooptrick sequence" chart. The right column showed page-crossing case of LDA ($b0),y where shift register was loaded and reloaded. That's when I thought back to my current software and worried that'll cause the shift register to loaded with wrong data such that one out every 8 pixels will contain wrong pixel info. Subsequent discussions made clear that page-crossing case of LDA ($b0),y does not access data memory twice for the W65C02 implementation which was also verified by observation on scope. OK, so that put the tangent to bed and return to the discussion about how to snoop the memory every 8 clocks for both page-crossing case and page-not-crossing case. I'm still struggling with the hardware approach...
Brief summary of the software approaches:
A. Reformat the graphic data so each scan line fits within a page
B. Display the raw graphic data as is, but enumerate all cases where scan lines cross page boundary. A different software routine is generated for each case. Fortunately, there are only 4 cases.

The proposed hardware approach is each scan line contains 80 sets of two instructions LDA ($b0),y and INC y. The two instructions take 8 clocks to execute for page-crossing case, but 7 clocks to execute for page-not-crossing cases. The hardware does two things:
1. Detect the page-not-crossing case and insert a wait state for INC y instruction so the two instructions will take 8 clocks to execute,
2. For page-crossing case, do not insert wait state, but latch data into the pixel shift register. There is some complexity associated with the shift register but the problem I'm try to figure out is how to latch valid data every 8 clocks.

Because hardware doesn't know whether a LDA ($b0),y will cross the page boundary until the instruction is executed, it needs to figure out a way to UNDO the wait state inserted to the previous INC y instruction, otherwise there will be 9 clocks elapsed before new pixel data showed up (INC y with one wait state is 3 clocks, and LDA ($b0),y crossing page boundary is 6 clocks). I can adjust pixel shifter to shift 7 bits instead of 8, but I cannot get around the missing bit during the dead cycle due to page-crossing.

Boy, talking about decend into nerd-dom...
Bill


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Fri Aug 19, 2022 11:00 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
plasmo wrote:
Subsequent discussions made clear that page-crossing case of LDA ($b0),y does not access data memory twice for the W65C02
Right; there'll be two accesses, but not both to data memory. Probably I fumbled that detail because mentally I'm in another scenario where the address simply doesn't matter. That's my story, and I'm sticking to it! :wink: Anyway, sometime later we can, if you're willing, discuss why accesses to the video buffer should ideally be identified by the cycle in which they occur (ie, timing), rather than the address which they place on the bus.

Quote:
Brief summary of the software approaches:
Honestly, this looks a fun problem, too! I wouldn't mind hearing more about it and seeing some code. But you did express interest in the hardware idea, and since I've already embarked on an explanation I should probably persevere.

Quote:
There is some complexity associated with the shift register but the problem I'm try to figure out is how to latch valid data every 8 clocks. [...] it needs to figure out a way to UNDO the wait state inserted to the previous INC y instruction, otherwise there will be 9 clocks elapsed before new pixel data showed up

9 clocks is actually OK. :mrgreen: That 9th bit in the shift reg is a buffer that gives us some wiggle room, timing-wise.

Within each LDA / INY pair, the first (and possibly only) load of the shift-reg happens before the shift-reg has entirely emptied from the previous load. It still has one bit remaining, so you're one cycle ahead of schedule -- that's why I call it an Early Load. The same clock edge that early-loads 8 bits of new data is also the clock edge that causes the final bit from the previous load to make its way to the output of the shift-register, where of course it will remain throughout the following cycle. Here's a diagram.
Attachment:
File comment: When all 9 muxes select their top input, I call that a Shift. All 9 on the middle input I call a Late Load. All 9 on the lower input I call an Early Load, although the last bit is actually connected so as to shift.
snooptrick shifter closeup.png
snooptrick shifter closeup.png [ 4.76 KiB | Viewed 8608 times ]

And during that following cycle we find out whether the load was good data, or just garbage because of a page crossing. If we have good data then at the end of the cycle we cause the reg to shift (and this shift will cause the first of the newly-fetched bits to appear at the output of the shift reg). But if we have garbage then the CPU will re-run the fetch and at the end of the cycle we cause the shift reg to do a second load. We're no longer one cycle ahead of schedule, so we use the Late Load path to put the byte one place further downstream so it'll immediately appear at the output of the shift reg.

Quote:
INC y with one wait state is 3 clocks, and LDA ($b0),y crossing page boundary is 6 clocks
Yup. So, let's look at what happens during those 9 clocks. I'm appending a rehash of the chart I presented earlier.

Note that the asterisks (which show when the first of the fetched bits actually appears at the shift register output) are exactly eight clocks apart, as required. That's despite the fact that bytes won't necessarily enter the shift-reg at an exactly regular pace.

Questions welcome. It's hard to feel confident that this is clear now, and I haven't omitted anything important... :roll:

-- Jeff


Attachments:
'no-cross then cross' sequence .png
'no-cross then cross' sequence .png [ 8.94 KiB | Viewed 8603 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Sat Aug 20, 2022 4:04 am 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
Dr Jefyll wrote:
Quote:
Brief summary of the software approaches:
Honestly, this looks a fun problem, too! I wouldn't mind hearing more about it and seeing some code. But you did express interest in the hardware idea, and since I've already embarked on an explanation I should probably persevere.

Attached are the software for
A. Reformat the graphic data so each scan line fits within a page. Original graphical data is pre-loaded in memory from $4a00 to $dfff and reformatted during vertical retrace period.
B. Display the raw graphic data as is, but enumerate all cases where scan lines cross page boundary. A different software routine is generated for each case. Fortunately, there are only 4 cases. Original graphical data is pre-loaded in memory from $4000 to $d5ff.

I think you've provided sufficient blue print to build the hardware. I'm still hung up over the 9th bit that needs to go out to video bitstream but not available because of the dead cycle. However, building the actual hardware tend to clarify the issue.
Bill


Attachments:
UsingNMI_reformat_image_$4a00-$dfff.zip [1.83 KiB]
Downloaded 100 times
UsingNMI_display_raw_image_$4000-$d5ff.zip [2.15 KiB]
Downloaded 95 times
Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Sat Aug 20, 2022 2:39 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3346
Location: Ontario, Canada
Timing sequences are tricky to communicate, which makes things tough. For example, one line of that chart says, "Fetch the Data and load the shift-reg."

But as you know the contents of the shift-reg don't actually change (ie, it doesn't actually load) until the clock-edge that marks the end of that cycle and the beginning of the next -- the load is an event that occurs between cycles! :| So, instead maybe that line in the chart should say, "prepare to load the shift reg,"... or, more precisely, "set up the muxes so the clock-edge will result in a load".

I didn't bother doing the chart all over again. But I did do a lot of edits to my post after it was already posted, trying to correct stuff that later struck me as incomplete or perhaps potentially misleading. I'm done now, and will leave it alone. But to see the latest rev please hit Refresh on your browser if you haven't already, and thanks for your patience! And if you're rolling up your sleeves to build the actual hardware, I'll certainly be interested to hear how you're getting along. Feel to PM me, BTW.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: CPLD + 6502 Trainer
PostPosted: Sat Aug 20, 2022 3:38 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1076
Location: Albuquerque NM USA
Jeff,
I’m a big fan of rapid spiral development cycle because each phase of product development has a particular perspective and language for communication; sometimes innovation occurred when moving from one phase into next when perspective and language shifted.

So I was stuck at the idea of issuing a byte data every 8 clock at design phase but moving into implementation I realize the data goes into a pipeline (shift register), so as long as the pipeline does not go dry, I have timing elasticity equals to the length of the pipe. I’m repeating what you are saying all along, but suddenly it made sense because of the shift to language of implementation. So thank you for being patient.
Bill


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 81 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron