plasmo wrote:
Oh, maybe this is the trick you are thinking of: the original graphic data needs to be reformatted to fit 3 lines per page so LDA ($b0),y does not cross page boundary.
Haha, nope, that's not it! What I had in mind are two possible ways to get around the excessive delay that results when conventional means are used to read bytes from memory and write them to the shift register.
And you've chosen to snoop the data bus. I gather that when the
LDA ($b0),y reads from the $4000-$DFFF region, the data bus gets copied to the shift reg at the end of the same cycle that does the read. This eliminates the delay otherwise required for a subsequent
STA to explicitly write to the shift reg. Nice!
The other trick I had in mind would be overkill for what you're presently doing, but it's a favorite of mine despite being somewhat gnarly... or maybe
because it's somewhat gnarly!
If anyone's interested, the so-called Cheap Video technique espoused by Don Lancaster back in the day allows a byte to be fetched and written to the shift reg
once per clock cycle -- ie, eight times faster than even the snoop trick!!
This makes it a great choice when the CPU is rather slow or the goals for video resolution are rather high. I've successfully implemented Cheap Video on three different machines, and I wrote about the technique
here.
plasmo wrote:
the original graphic data needs to be reformatted to fit 3 lines per page so LDA ($b0),y does not cross page boundary. The reformatting is done to the original graphic data during the vertical retrace period.
I like how you use the vertical retrace period for this! But have you considered tricks that would
manage page crossings rather than avoiding them? Maybe you just haven't gotten to this yet.
Edit: or maybe the reformatting doesn't have enough of a downside for you to consider it a problem -- which is fine.
Anyway, the way my brain works I'd be thinking about eliminating the reformatting, and to do that it's necessary to deal with the effect of page crossings on timing. I'd wanna eliminate the single-cycle NOP, and "spend" that cycle in one of two ways:
- arrange for a Wait State that only occurs when there's no Page Crossing. I know that's not entirely straightforward, but I'm sure it's doable, and it would let you guarantee that it always takes 8 cycles to move each byte, thus ensuring consistent timing. Or, ...
- replace each
LDA ($b0),y with a
STA ($b0),y. STA using (ind),y mode always take six cycles, regardless of whether or not there's a page crossing. So, instead of generating conditional Wait States, the challenge instead will be to keep the /WE pin of the RAM high so it reads instead of writing. Also, you'd need to somehow avoid bus contention between the RAM and the CPU. Do those guys connect directly together, or does your CPLD act as a middleman? If they connect directly together then this idea is a dud!
-- Jeff