65Org16.x Dev. Board V1.1 using a Spartan 6 XC6SLX9-3TQG144

BigEd · Post by **BigEd** » Sun Mar 25, 2012 8:54 am

Yes, that's what I was thinking. All very OT...

Moving graphics off chip and using BRAMs for cache, ZP and stack seems like a better plan!

Cheers
Ed

Arlet · Post by **Arlet** » Sun Mar 25, 2012 11:05 am

Before diving into the external memory, I'm experimenting a bit with interlace mode. I had it turned off until now, which results in a nice low-flicker image, but also in substantial loss in vertical resolution. This would be especially bad on NTSC, where the vertical resolution is already lower.

Edit: quick test reveals that interlacing looks acceptable except when 1-pixel thin horizontal lines are used, such as top bar of the letter 'T'. When the lines are 2 pixels wide, such as in the bar of the 'H', it looks fine. The single pixel horizontal lines only show up on half the fields, and flicker at 25 Hz rate (30 Hz for NTSC). This means that the font may need some manual intervention after bitmap generation.

Since I'm still testing with my old font, the letters are only half the height, and look very squished. I'll generate a proper square pixel font, and try again.

Edit2: made a 26 pixel font. All horizontal lines happened to be at least 2 pixels thick, so no editing was necessary. I needed to pick a smaller font to make it fit in the same block RAM. The previous 36 pixel font only had the even lines saved, so it would only take up half the space. Using interlacing, the characters certainly appear to be more rounded on the screen, to the point where anti-aliasing seems unnecessary.

By the way, I'm using the BMFont utility to convert TTF fonts to appropriately sized bitmaps.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/interlaced.jpg

ElEctric_EyE · Post by **ElEctric_EyE** » Sun Mar 25, 2012 1:32 pm

Arlet wrote:

...By the way, I'm using the BMFont utility to convert TTF fonts to appropriately sized bitmaps.

Cool, I've been experimenting with Freescale Embedded GUI Image Converter Utility (zip) :
Freescale Graphical User Interface with Image Converter Utility for microcontrollers and microprocessors.

I'll check BMFont out.

Arlet · Post by **Arlet** » Sun Mar 25, 2012 2:58 pm

Made a huge font (128 pixel high). Only one "letter" fits in the block RAM. In its current design, the renderer allows up to 255x255 pixel characters, but the bitmap for a single letter would take 4 block RAMs

. I use a 32 bit wide block RAM to look up 4 different character metrics that are 8 bits per value, so it's not easy to make bigger than 255x255 characters.

6502.org wrote:

Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/itsyourday.jpg

Arlet · Post by **Arlet** » Mon Mar 26, 2012 1:44 pm

I was looking again at the SDRAM controller, and I realized I set the SDRAM for CAS=2 timing, but according to the data sheet the -6A part is only specified for CAS=3 @ 167 MHz.

The problem is that I don't think we'll ever run the SDRAM at 167MHz, and if we're going to run the SDRAM at 133 MHz or less, it would be faster to use the slower speed grade -7E part with CAS=2 setting.

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Mar 26, 2012 4:11 pm

I should've just gone with a 1MBx16 10ns asynch RAM. Same 54-pin TSOP II footprint...

Do you have a desoldering station? or just go with the slower speed?

Arlet · Post by **Arlet** » Mon Mar 26, 2012 4:37 pm

Yes, I can desolder the part. On the other hand, I'm guessing it will work fine at 100 MHz with CAS=2. We can try it out, and if it doesn't work, replace it with the lower speed grade, or set CAS=3.

Arlet · Post by **Arlet** » Mon Mar 26, 2012 6:05 pm

I'm working on a new version of the SDRAM controller. The goal is to make it both simpler and faster than the previous version. In the previous version, after a read it would always precharge the row. As a consequence, when the next read was to the same row, it had to be activated again.

I'm changing the controller to leave the row active after it's been accessed. That way, the next access to the same row will not require activation, which saves 2 cycles. In addition, I'm keeping track of all banks independently, so each bank can have a different row activated, and the CPU can do random accesses to 4 banks without activating a row. Each row is 1kB worth of data, so the CPU could access up to 4kB of data in random patterns, without incurring extra activation/precharge delays.

A random read to an active row now results in 4 wait states (CAS=2). For an async SRAM, you'd probably be looking at 2 wait states, so the SDRAM isn't that much worse in fairly typical cases. For writes, the SDRAM is just as fast as SRAM, because the pipeline delays are mostly hidden. A read from a different row in the same bank takes a bit longer.

For instance, when doing LDA ABS, where the address falls in SDRAM area, the instruction will take a total of 8 cycles, compared to 4 when reading from BRAM, and 6 when reading from external SRAM. However, out of those 8 cycles, there is only 1 read cycle on the SDRAM bus, and the other 7 cycles are still potentially available, for example for video data fetches.

Arlet · Post by **Arlet** » Tue Mar 27, 2012 9:35 am

I just had an idea. The problem with running code from SDRAM is that even for sequential accesses, the SDRAM will still add 4 wait states for every access, due to total pipeline latency.

Now, after the 6502 reads a word from memory, there will always be a few cycles before SDRAM is accessed again, due to 6502 instruction timing. During this time, the memory interface could stay in read mode, and store the next few words in a small local LUT memory, like a 1-line mini cache.

In case the 6502 wants to read the next sequential word, it will already be present in the LUT RAM. This would mean a nice speed up of code execution, while minimizing FPGA resources (especially BRAMs)

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Mar 27, 2012 1:47 pm

Sounds promising! On a side note related to memory:

I have done general IC placement for the layout of two expansion boards that will plug on top of the 3 headers. Only 1 will work at a time. Each will have their own power in. Max devices per board are not known yet, although SyncRAM is looking like 8 right now. At least 8 for SDRAM, maybe 12. When I start to lay down traces I'll start to get an idea. Each device (SDRAM&SyncRAM) takes between 230-300mA. I'll be very observant of the power/GND scheme. I know power will be coming into the middle of the board... I will make a new thread soon when this becomes more solid.

One style will be made for 16MBx16 SDRAM that is being used now. There are 13-pins available now at the headers for the separate /CS's needed for each device.

The other style will be made for the 6.5ns 4Mx16 SyncRAM mentioned earlier. Other memory sizes (256Kx16, 512Kx16, 1Mx16, 2Mx16) can be used as their pinouts are NC for the higher address bits. For this to work the SDRAM will have to be disabled as the pins dedicated for it will have to be used. This will leave 18-pins on the headers that can be used for A13-A21 and 8 separate /CS. Leaving 1 pin free.

Arlet · Post by **Arlet** » Tue Mar 27, 2012 3:41 pm

How about some extra flash too ? All that RAM will need to be filled with data, and the USB/Serial port is too slow to use that RAM in a meaningful way. If you add a micro SD holder, you can quickly access plenty of data.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Mar 27, 2012 4:24 pm

Excellent idea. I was actually thinking of experimenting by filling the RAM by using the USB connection. I tested it up to 256Kbaud successfully, and I'm sure it could go faster. The br@y terminal has a setting for almost up to 1Mb/sec.
But you're right, for transportability a large removable media would be best and that type of simple core would probably fit in a very small OTP CPLD. But now thinking... Would need a cpu to control it so it could receive general commands, to copy data to/from SD/RAM, from the mainboard through decoded I/O not in RAM space. Will work on it on my offtime at work...

Here at home on my day off I try to catch up to your progress by tackling I2C again with your GPIO I have successfully put in my project. Looking at this part of your 6502 code though:

Code: Select all

;; X contains register 
;; A contains value 
video_write_reg:                
        stx     addr            
        sta     val              
        jsr     i2c_delay        
        jsr     i2c_delay        
        jsr     i2c_start        
        lda     #0              ;address of I2C device?
        jsr     i2c_wrbyte      
        lda     addr            
        jsr     i2c_wrbyte      
        lda     val              
        jsr     i2c_wrbyte      
        jmp     i2c_stop

This value '0', isn't this the address of the I2C device, which default for CS4954 is $0F?

Arlet · Post by **Arlet** » Tue Mar 27, 2012 4:55 pm

ElEctric_EyE wrote:

But now thinking... Would need a cpu to control it so it could receive general commands, to copy data to/from SD/RAM, from the mainboard through decoded I/O not in RAM space. Will work on it on my offtime at work...

I was thinking that you could hook up the SD card to the FPGA, you'll only need a few pins for the SPI interface. After that, the existing CPU core on the FPGA could be programmed to read the card. It only needs to be able to download the boot sector, load that in RAM and jump to it.

Quote:

This value '0', isn't this the address of the I2C device, which default for CS4954 is $0F?

$0F is the number of the register in the CS4594 which contains the I2C address. By default it is initialized to '0' (see p44 of the data sheet).

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Mar 27, 2012 5:52 pm

Arlet wrote:

I was thinking that you could hook up the SD card to the FPGA, you'll only need a few pins for the SPI interface. After that, the existing CPU core on the FPGA could be programmed to read the card. It only needs to be able to download the boot sector, load that in RAM and jump to it.

Yes, the SPI pins are on Header #1, but the physical SD card connector would have to go on the daughterboard without modifying the current mainboard. So the SPI interface would have to go onto a CPLD at least I would think...

On your I2C_write_reg routine, I am trying to make it a subroutine I can call and return from. It should still work if I replace all the JMPs with JSRs+RTSs? My cpu is at 40MHz, should I change the delay values? Sorry, it's difficult for my brain to switch back to "software mode". Alot of things going on in there...

Arlet · Post by **Arlet** » Tue Mar 27, 2012 6:03 pm

ElEctric_EyE wrote:

On your I2C_write_reg routine, I am trying to make it a subroutine I can call and return from. It should still work if I replace all the JMPs with JSRs+RTSs? My cpu is at 40MHz, should I change the delay values? Sorry, it's difficult for my brain to switch back to "software mode". Alot of things going on in there...

You can already call those routines. The JMPs will go to a sequence of instructions that end in RTS, so it will return to the original caller.

You can keep the same delay values for the CS4954. My I2C timings are actually way too fast for standard 100/400 kHz operation. That was a miscalculation on my part, but I noticed the CS4954 handles it without a problem, and actually has quite a fast I2C interface. For 40 MHz, it will be a bit slower, which is probably good. To interface the other I2C devices (I haven't tried that), you may actually have to increase the delay value.