6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Sep 25, 2024 8:29 pm

All times are UTC




Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 17, 18, 19, 20, 21, 22, 23 ... 41  Next
Author Message
PostPosted: Mon Mar 11, 2013 4:21 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
After figuring some things out on paper, the only operation that would have to be done is to shift the cpu address to the right 6x, then truncate [31:21], to form a proper address for the videoRAM. This would be after putting the X in the LSB and the Y in the MSB for indirect indexed in the address space $8000_0000 to $8FFF_FFFF.
Code:
(X,Y)      indirect address HEX   cpu address BIN [31:0]                  videoram address BIN [20:0]
(0,0)      $0000,$8000            %1000_0000_0000_0000__0000_0000_0000_0000   00_0_0000_0000__00_0000_0000
(1,0)      $0001,$8000            %1000_0000_0000_0000__0000_0000_0000_0001   00_0_0000_0000__00_0000_0001
(0,1)      $0000,$8001            %1000_0000_0000_0001__0000_0000_0000_0000   00_0_0000_0001__00_0000_0000
(1,1)      $0001,$8001            %1000_0000_0000_0001__0000_0000_0000_0001   00_0_0000_0001__00_0000_0001
(20,20)    $0014,$8014            %1000_0000_0001_0100__0000_0000_0001_0100   00_0_0001_0100__00_0001_0100
(639,479)  $027F,$81DF            %1000_0001_1101_1111__0000_0010_0111_1111   00_1_1101_1111__10_0111_1111

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 4:32 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The way I had intended was :

(639,0) -> CPU address $8000_027F -> video address $27f
(0,1) -> CPU address $8001_0000 -> video address $280

The idea is that video is contiguous in physical memory, but for the CPU it looks like the screen is 64K pixels wide.

Also, keep in mind this only makes it easier when you're doing random access, for instance when drawing lines, or writing text. For a clear screen routine, there's no advantage.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 5:34 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
..Also, keep in mind this only makes it easier when you're doing random access, for instance when drawing lines, or writing text. For a clear screen routine, there's no advantage.

Yes, I was thinking Mandlebrot that Bruce had made for the 65816 and adapt it to 65Org16 (I thought he had originally made it for the 65Org16, guess I was wrong). EDIT: Here is the thread.

Maybe there would be an advantage to have the ability for the cpu to easily select how to address the videoRAM? If the cpu is accessing $80000000-$8FFFFFFF it uses the optimized addressing scheme, if it's accessing $90000000-$901FFFFF is uses the standard linear addressing.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 5:45 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Even when the CPU is accessing memory linearly, like in a clear screen, the overhead of using the new mapping isn't too bad. It's just a matter of adjusting your loops so that the inner loop clears exactly one line, and then you repeat that 480 times. It will be a few microseconds slower than what you have now.

Plus that you could very easily modify the code to only clear a specific rectangle, with about the same performance.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 6:00 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Even just 1 more opcode, like an ORA #$1010 for example, in the main loop is very noticeable in the clearscreen speed and I'm talking multiple frames per second. Also, reading picture data or even sprites would favor the linear addressing I would think.
I can experiment with the optimized addressing tomorrow.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 6:09 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
There wouldn't be any change to the inner loop itself, but instead of running the inner loop for 64k times, it would run for 640 times (one screen line). The effect would be that you'd have to set it up 480 times, instead of 5 times. So, there's a small penalty, but since the code spends 99% of the time in the inner loop, the penalty is very small.

If you wanted it really fast, a hardware bit blitter would be a nice project.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 6:36 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...If you wanted it really fast, a hardware bit blitter would be a nice project.

This is my area of interest, but there is so much to learn! And I only seem to learn by doing, such a slow process. :|

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 6:51 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8517
Location: Southern California
and there's Samuel Falvo's blitter article at http://6502org.wikidot.com/blitter-theory-part-1 (although it looks like a few equations need fixing in order to show up at all. I'll contact him.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 11, 2013 9:26 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Another trick that you could do, is to do a 'banking' scheme. So, instead of mapping the entire video memory in the CPU space, you can map a single line. The mapping is done by writing the Y coordinate in a special register, and this causes the hardware to map one 640-pixel line in a fixed memory area. The advantage is that you can then access this memory using ABS,X addressing (or even ZP, X if you map it into zero page), which saves a cycle compared to (ZP), Y.

Having fixed address also makes it easier to unroll the loop. So, you could do something like:
Code:
STA line, X
STA line+64,X
STA line+128,X
...
STA line+576,X

And then let X go from 63 to 0.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 12, 2013 2:30 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I'll think about your latest idea and let it sink in over the next few days. If at least 1 cycle can be saved, I will take that suggestion! But first, I'll try your previous idea using indirect indexed addressing and extracting X and Y values from the LSB and MSB.

Also, I have been thinking about 'unrolling' loops as well in order to save cycles. Then I start thinking about some simple self modifying code. Unrolling loops takes alot of RAM especially if that videoRAM is mapped entirely in the cpu address space...

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 12, 2013 5:37 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I'd be more worried about running out of block RAMs than running out of 4GB memory space! And even then, I'd only worry when I'd really started to run out. You can always reduce the unrolling later if you find that you need the extra 20 bytes.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 12, 2013 1:24 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
ElEctric_EyE wrote:
After figuring some things out on paper, the only operation that would have to be done is to shift the cpu address to the right 6x, then truncate [31:21], to form a proper address for the videoRAM...

My brain must have been in software mode, because in the Verilog all I needed to do is a simple reassignment:
Code:
always @* begin                           //optimize the videoRAM address for plotting (X,Y) in the (LSB,MSB) for indirect indexed
   cpuABopt [20:19] <= 0;                  //bank bits
   cpuABopt [18:10] <= cpuAB [24:16];      //Y[8:0]
   cpuABopt [9:0] <= cpuAB [9:0];         //X[9:0]
end

I've changed from a single linear pixel counter to a dual X, Y counter. I will have to change my character plotting software now for this new setup, but first I'll write a clear screen routine.

Then I can take advantage of the dual FPGA PROM and do a quick speed comparison. Linear addressing clear screen in 1 and optimized clearing in the other. They will be cycling through the 4-bit color look up table. Speed is back down to 50MHz. The addition of the X,Y counters put the design over the edge. It's ok though, towards the end of the project I can increase VSYNC and HSYNC frequency to boost the pixelclock then hopefully that will put the CPU closer to 80MHz.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 12, 2013 2:48 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Wow, it works, this is really cool! Arlet, thanks for the idea.
Now simple plotting is math free at the expense of a slightly larger memory map. Awesome. It's just as fast as linear addressing too.
Code:
CLRSCR      LDA #$0000        ;$8000_0000 START OF VIDEO MEMORY
            STA SCRLO         ;X VALUE
            LDA #$8000
            STA SCRHI         ;Y VALUE
            LDX #$01E0       
            LDA SCRCOL
AB          LDY #$0280        ;$81DF_027F END OF VIDEO MEMORY
AA          STA (SCRLO),Y
            DEY
            BNE AA
            STA (SCRLO),Y     ;GET THAT LAST PIXEL!
            INC SCRHI
            DEX
            BNE AB
            RTS


EDIT: After cleaning up the some of the Verilog speed is back up to 100MHz, without smartXplorer. I chose Performance with IOB Packing for Design Strategy.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 12, 2013 3:21 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
I think page flipping is the next logical step to start thinking about.

But today and tomorrow, I'm gonna try converting Bruce's Mandlebrot and Garth's geometric tables for plotting shapes.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 12, 2013 7:20 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
My mind is wandering with all the possibilities of this math free plotting! Now an origin based, 4 quadrant system seems easy without the burden of the math which has clouded my thinking in the past.
I can see how easy on paper it is to draw rectangles hollow and filled. I must put it to the test next, then move onto parallelograms. Then circles and ellipses. I need to add a hardware timer in the next PVB design.

I will have some pics soon.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 609 posts ]  Go to page Previous  1 ... 17, 18, 19, 20, 21, 22, 23 ... 41  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: