6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Sep 25, 2024 8:28 pm

All times are UTC




Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 16, 17, 18, 19, 20, 21, 22 ... 41  Next
Author Message
PostPosted: Sat Mar 09, 2013 6:31 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ElEctric_EyE wrote:
I sort of gave up on the IODELAY since I couldn't figure it out.

I had a hard time with it too, trying to instantiate the verilog code, but then I tried the core generator wizard, and it actually made a working design pretty quickly. I hand edited that, because it put in an extra register at the input side, which I didn't need. After experimenting with different delays, I ended up taking out the IODELAY again.

Anyway, I kept the code, so if you ever need delayed I/O, let me know.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 1:12 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...Anyway, I kept the code, so if you ever need delayed I/O, let me know.

Thanks, I will have to flesh out the read timing to the RAM delayed by 2 cycles as you suggest first. That IOBDelay spec should bring things even tighter for even higher speeds than 100MHz I would think, i.e. if one was to consider running the SyncRAM at higher speeds in order to achieve higher resolutions.

In other matters, I am thinking of ways to maximize plotting communication between the CPU and the HVSync modules. Old school 6502 software would have had to repetitively add the horizontal resolution to obtain the Y value and monitor the carry in order to increment the MSB, then add in the X value and again monitor the carry.

Today I was thinking of an idea to implement in Verilog that would be superior as far as execution speed is concerned when plotting individual pixels:
Integrate the X and Y pixel counters from the HVSync module and have them go direct into the 65Org16 and have a plot opcode when X counter and Y counter both equaled the values present in 2 X/Y accumulators. Then the CPU would auto write the SyncRAM with a color value from a 3rd accumulator. The only issue being here is that the CPU is running at a multiple of the pixel clock frequency. This means that maybe other video timing signals maybe have to be brought into the CPU directly for the CPU to have it's own X and Y counters being run at a multiple.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 7:28 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
A simple thing you could do is to make a memory interface between CPU and video bitmap that's optimized for the CPU. For instance, if you had an 640x480 resolution, you could map that to a bitmap of 1024x480, or even 65536x480, so you could put the X coordinate in zero page register 'p', and put the Y coordinate in register 'p+1', and just write to the pixel as 'sta (p), y'.

In the FPGA you would take the address, split the bits in X/Y coordinates, and quietly access the memory at [Y * 640 + X] instead.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 11:50 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Excellent food for thought!

My initial thoughts are to bring the IR out of the core so the interface knows when that special indirect STA opcode (STQ ($xxxx),y or STQ ($xxx),w) is executed and to pluck the 2 addresses off of the databus?
I have to think on this some more, but obviously you have already though. Maybe this should be done inside the core itself.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 11:58 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
You can treat all memory accesses this way. For instance, if you take vertical resolution of 480, you need 480*64K for the bitmap, which is "only" 30MB. Since you have a 4GB address space, it wouldn't be so bad to waste 3% on this trick. So, if a memory access falls in that 30MB window, it is treated as a special video bitmap access, and otherwise it is treated normally.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 11:59 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
On the other hand, with a good multiply instruction, it may be easier to just do this in software :) Inserting the multiplier in the critical address path may not be such a good idea after all...


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 1:36 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
After thinking about this more, could it be as simple as this?:

Instead of having a single [20:0] bit counter for the video address in the HVSync generator, have 2 counters x [9:0] and y [8:0] and present it to the VideoRAM as [18:0]. Some memory would be wasted though.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 1:40 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The idea was to only have this translation in the memory interface between CPU and video RAM.

When displaying the bitmap, the video module directly grabs the memory without any translation.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 1:46 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
But if the CPU and video module access the memory with the same nonlinear pattern everything is good I would think.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 1:50 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
For the video module it's easy to access the memory as just a straight bitmap. You don't even have to generate the address from X/Y coordinate. Just initialize address to first pixel when there's a VSYNC, and then just increment for every pixel.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 2:08 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
For what I'm thinking of, the video and CPU have to have identical 'memory maps' to work. I'm at work now, or I would try this idea out immediately!
The address for pixel (0,1) would be 00_00_0000_0000_0_0000_0001 (bb,xx_xxxx_xxxx,y_yyyy_yyyy where b is bank x is X and y is Y). For pixel (639,479)would be 00_10_0111_1111_1_1101_1111.
In a straight bitmap a value for pixel (0,1) would be 00_0000_0000_0000_0001, and (639,479) would be 100_1011_0000_0000_0000. If the video module were setup for linear, things wouldn't match up with the cpu.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 7:25 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Brainfart. I deviated from my original suggestion, sorry. I realized it's pretty much already set up for yyy_yyyy_yyxx_xxxx_xxxx. I was thinking if I could get the y value then barrel shift it 10x to the left and add it to the X value I would get a nice quick address. But I can't do that easily with a 16-bit CPU.
I'm trying to think ahead for when a controller board is sending raw coordinates to be plotted, in pixel mode.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 7:47 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
A simple thing you could do is to make a memory interface between CPU and video bitmap that's optimized for the CPU. For instance, if you had an 640x480 resolution, you could map that to a bitmap of 1024x480, or even 65536x480, so you could put the X coordinate in zero page register 'p', and put the Y coordinate in register 'p+1', and just write to the pixel as 'sta (p), y'.

In the FPGA you would take the address, split the bits in X/Y coordinates, and quietly access the memory at [Y * 640 + X] instead.

I begin to understand what you meant by this now. It would be no problem to dedicate a large chunk of memory for this idea to work. I can dedicate from $8000_0000 to $8FFF_FFFF. 268MB. If I put the Y in the MSB and X in the LSB, it would only go up to $81E0_8280 for bank 1 of 4.
So this would have to mean the real CPU address going to the videoRAM would be intercepted by another module that would rearrange the bits at all times. I think this might be within my skill range.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 11:02 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Time to take some baby steps. I'm starting from a known working foundation and will change one variable at a time:

1) 640x480x16 bit color. Pixel clock @25MHz.
2) 1K zeropage starting @0000_0000. 1K stackpage starting @0001_0000.
3) 65Org16.b Core and external SyncRAM running @100MHz after using smartExplorer. mapgloboptioreg design strategy.
4) Software performs a 'clear screen' by writing to SyncRAM a value of '1111_11111_1111_1111'. Then successfully reads character data from blockRAM and plots an 8x8 pixel blue color character '#', starting at (0,0). A read from this videoRAM, where the character data is, is incorrect as it plots the read data 16 pixels to the right and it's a dark red. I've not tried to correct for the 2 cycle delay yet.
5) HVSync module is using a linear address counter for the SyncRAM.
6) Address decoding for the VideoRAM is active from $80000_0000-$8FFF_FFFF.
7) All signals from FPGA to SyncRAM are constrained to 4mA 'DRIVE' strength. All signals from FPGA to videoDAC are constrained to 12mA 'DRIVE' strength (character bit placement errors were noted driving the videoDAC @4mA).
8) .1uF bypass capacitors still need to be added to the FPGA, SyncRAM and videoDAC.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 10, 2013 11:12 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
So, the addition of step #6 has not resulted in lowering the core below 100MHz.

EDIT: This is the CLeaRSCReen routine. It is about to drastically change if Arlet's suggestion is successful!
Code:
CLRSCR      LDA #$0000        ;$8000_0000 START OF VIDEO MEMORY
            STA SCRLO
            TAY
            LDA #$8000
            STA SCRHI
            PHA
            LDX #$0004        ;4x65536= 262,144. WE NEED 307,200 PIXELS for 640x480
            LDA SCRCOL
AA          STA (SCRLO),Y
            DEY
            BNE AA
            INC SCRHI
            DEX
            BNE AA
            STA (SCRLO),Y     ;get that last pixel!
            LDY #$AFFF
AB          STA (SCRLO),Y     ;CLEAR REMAINING  45056 PIXELS
            DEY
            BNE AB
            PLA
            STA SCRHI         ;reset MSB of videoRAM pointer
            RTS

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 16, 17, 18, 19, 20, 21, 22 ... 41  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: