65Org16.x Dev. Board V1.1 using a Spartan 6 XC6SLX9-3TQG144
Before diving into the external memory, I'm experimenting a bit with interlace mode. I had it turned off until now, which results in a nice low-flicker image, but also in substantial loss in vertical resolution. This would be especially bad on NTSC, where the vertical resolution is already lower.
Edit: quick test reveals that interlacing looks acceptable except when 1-pixel thin horizontal lines are used, such as top bar of the letter 'T'. When the lines are 2 pixels wide, such as in the bar of the 'H', it looks fine. The single pixel horizontal lines only show up on half the fields, and flicker at 25 Hz rate (30 Hz for NTSC). This means that the font may need some manual intervention after bitmap generation.
Since I'm still testing with my old font, the letters are only half the height, and look very squished. I'll generate a proper square pixel font, and try again.
Edit2: made a 26 pixel font. All horizontal lines happened to be at least 2 pixels thick, so no editing was necessary. I needed to pick a smaller font to make it fit in the same block RAM. The previous 36 pixel font only had the even lines saved, so it would only take up half the space. Using interlacing, the characters certainly appear to be more rounded on the screen, to the point where anti-aliasing seems unnecessary.
By the way, I'm using the BMFont utility to convert TTF fonts to appropriately sized bitmaps.
Edit: quick test reveals that interlacing looks acceptable except when 1-pixel thin horizontal lines are used, such as top bar of the letter 'T'. When the lines are 2 pixels wide, such as in the bar of the 'H', it looks fine. The single pixel horizontal lines only show up on half the fields, and flicker at 25 Hz rate (30 Hz for NTSC). This means that the font may need some manual intervention after bitmap generation.
Since I'm still testing with my old font, the letters are only half the height, and look very squished. I'll generate a proper square pixel font, and try again.
Edit2: made a 26 pixel font. All horizontal lines happened to be at least 2 pixels thick, so no editing was necessary. I needed to pick a smaller font to make it fit in the same block RAM. The previous 36 pixel font only had the even lines saved, so it would only take up half the space. Using interlacing, the characters certainly appear to be more rounded on the screen, to the point where anti-aliasing seems unnecessary.
By the way, I'm using the BMFont utility to convert TTF fonts to appropriately sized bitmaps.
6502.org wrote:
Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/interlaced.jpg
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Arlet wrote:
...By the way, I'm using the BMFont utility to convert TTF fonts to appropriately sized bitmaps.
Freescale Graphical User Interface with Image Converter Utility for microcontrollers and microprocessors.
I'll check BMFont out.
Made a huge font (128 pixel high). Only one "letter" fits in the block RAM. In its current design, the renderer allows up to 255x255 pixel characters, but the bitmap for a single letter would take 4 block RAMs
. I use a 32 bit wide block RAM to look up 4 different character metrics that are 8 bits per value, so it's not easy to make bigger than 255x255 characters.
6502.org wrote:
Image no longer available: http://ladybug.xs4all.nl/arlet/fpga/itsyourday.jpg
I was looking again at the SDRAM controller, and I realized I set the SDRAM for CAS=2 timing, but according to the data sheet the -6A part is only specified for CAS=3 @ 167 MHz.
The problem is that I don't think we'll ever run the SDRAM at 167MHz, and if we're going to run the SDRAM at 133 MHz or less, it would be faster to use the slower speed grade -7E part with CAS=2 setting.
The problem is that I don't think we'll ever run the SDRAM at 167MHz, and if we're going to run the SDRAM at 133 MHz or less, it would be faster to use the slower speed grade -7E part with CAS=2 setting.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
I'm working on a new version of the SDRAM controller. The goal is to make it both simpler and faster than the previous version. In the previous version, after a read it would always precharge the row. As a consequence, when the next read was to the same row, it had to be activated again.
I'm changing the controller to leave the row active after it's been accessed. That way, the next access to the same row will not require activation, which saves 2 cycles. In addition, I'm keeping track of all banks independently, so each bank can have a different row activated, and the CPU can do random accesses to 4 banks without activating a row. Each row is 1kB worth of data, so the CPU could access up to 4kB of data in random patterns, without incurring extra activation/precharge delays.
A random read to an active row now results in 4 wait states (CAS=2). For an async SRAM, you'd probably be looking at 2 wait states, so the SDRAM isn't that much worse in fairly typical cases. For writes, the SDRAM is just as fast as SRAM, because the pipeline delays are mostly hidden. A read from a different row in the same bank takes a bit longer.
For instance, when doing LDA ABS, where the address falls in SDRAM area, the instruction will take a total of 8 cycles, compared to 4 when reading from BRAM, and 6 when reading from external SRAM. However, out of those 8 cycles, there is only 1 read cycle on the SDRAM bus, and the other 7 cycles are still potentially available, for example for video data fetches.
I'm changing the controller to leave the row active after it's been accessed. That way, the next access to the same row will not require activation, which saves 2 cycles. In addition, I'm keeping track of all banks independently, so each bank can have a different row activated, and the CPU can do random accesses to 4 banks without activating a row. Each row is 1kB worth of data, so the CPU could access up to 4kB of data in random patterns, without incurring extra activation/precharge delays.
A random read to an active row now results in 4 wait states (CAS=2). For an async SRAM, you'd probably be looking at 2 wait states, so the SDRAM isn't that much worse in fairly typical cases. For writes, the SDRAM is just as fast as SRAM, because the pipeline delays are mostly hidden. A read from a different row in the same bank takes a bit longer.
For instance, when doing LDA ABS, where the address falls in SDRAM area, the instruction will take a total of 8 cycles, compared to 4 when reading from BRAM, and 6 when reading from external SRAM. However, out of those 8 cycles, there is only 1 read cycle on the SDRAM bus, and the other 7 cycles are still potentially available, for example for video data fetches.
I just had an idea. The problem with running code from SDRAM is that even for sequential accesses, the SDRAM will still add 4 wait states for every access, due to total pipeline latency.
Now, after the 6502 reads a word from memory, there will always be a few cycles before SDRAM is accessed again, due to 6502 instruction timing. During this time, the memory interface could stay in read mode, and store the next few words in a small local LUT memory, like a 1-line mini cache.
In case the 6502 wants to read the next sequential word, it will already be present in the LUT RAM. This would mean a nice speed up of code execution, while minimizing FPGA resources (especially BRAMs)
Now, after the 6502 reads a word from memory, there will always be a few cycles before SDRAM is accessed again, due to 6502 instruction timing. During this time, the memory interface could stay in read mode, and store the next few words in a small local LUT memory, like a 1-line mini cache.
In case the 6502 wants to read the next sequential word, it will already be present in the LUT RAM. This would mean a nice speed up of code execution, while minimizing FPGA resources (especially BRAMs)
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Sounds promising! On a side note related to memory:
I have done general IC placement for the layout of two expansion boards that will plug on top of the 3 headers. Only 1 will work at a time. Each will have their own power in. Max devices per board are not known yet, although SyncRAM is looking like 8 right now. At least 8 for SDRAM, maybe 12. When I start to lay down traces I'll start to get an idea. Each device (SDRAM&SyncRAM) takes between 230-300mA. I'll be very observant of the power/GND scheme. I know power will be coming into the middle of the board... I will make a new thread soon when this becomes more solid.
One style will be made for 16MBx16 SDRAM that is being used now. There are 13-pins available now at the headers for the separate /CS's needed for each device.
The other style will be made for the 6.5ns 4Mx16 SyncRAM mentioned earlier. Other memory sizes (256Kx16, 512Kx16, 1Mx16, 2Mx16) can be used as their pinouts are NC for the higher address bits. For this to work the SDRAM will have to be disabled as the pins dedicated for it will have to be used. This will leave 18-pins on the headers that can be used for A13-A21 and 8 separate /CS. Leaving 1 pin free.
I have done general IC placement for the layout of two expansion boards that will plug on top of the 3 headers. Only 1 will work at a time. Each will have their own power in. Max devices per board are not known yet, although SyncRAM is looking like 8 right now. At least 8 for SDRAM, maybe 12. When I start to lay down traces I'll start to get an idea. Each device (SDRAM&SyncRAM) takes between 230-300mA. I'll be very observant of the power/GND scheme. I know power will be coming into the middle of the board... I will make a new thread soon when this becomes more solid.
One style will be made for 16MBx16 SDRAM that is being used now. There are 13-pins available now at the headers for the separate /CS's needed for each device.
The other style will be made for the 6.5ns 4Mx16 SyncRAM mentioned earlier. Other memory sizes (256Kx16, 512Kx16, 1Mx16, 2Mx16) can be used as their pinouts are NC for the higher address bits. For this to work the SDRAM will have to be disabled as the pins dedicated for it will have to be used. This will leave 18-pins on the headers that can be used for A13-A21 and 8 separate /CS. Leaving 1 pin free.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Excellent idea. I was actually thinking of experimenting by filling the RAM by using the USB connection. I tested it up to 256Kbaud successfully, and I'm sure it could go faster. The br@y terminal has a setting for almost up to 1Mb/sec.
But you're right, for transportability a large removable media would be best and that type of simple core would probably fit in a very small OTP CPLD. But now thinking... Would need a cpu to control it so it could receive general commands, to copy data to/from SD/RAM, from the mainboard through decoded I/O not in RAM space. Will work on it on my offtime at work...
Here at home on my day off I try to catch up to your progress by tackling I2C again with your GPIO I have successfully put in my project. Looking at this part of your 6502 code though:
This value '0', isn't this the address of the I2C device, which default for CS4954 is $0F?
But you're right, for transportability a large removable media would be best and that type of simple core would probably fit in a very small OTP CPLD. But now thinking... Would need a cpu to control it so it could receive general commands, to copy data to/from SD/RAM, from the mainboard through decoded I/O not in RAM space. Will work on it on my offtime at work...
Here at home on my day off I try to catch up to your progress by tackling I2C again with your GPIO I have successfully put in my project. Looking at this part of your 6502 code though:
Code: Select all
;; X contains register
;; A contains value
video_write_reg:
stx addr
sta val
jsr i2c_delay
jsr i2c_delay
jsr i2c_start
lda #0 ;address of I2C device?
jsr i2c_wrbyte
lda addr
jsr i2c_wrbyte
lda val
jsr i2c_wrbyte
jmp i2c_stop ElEctric_EyE wrote:
But now thinking... Would need a cpu to control it so it could receive general commands, to copy data to/from SD/RAM, from the mainboard through decoded I/O not in RAM space. Will work on it on my offtime at work...
Quote:
This value '0', isn't this the address of the I2C device, which default for CS4954 is $0F?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Arlet wrote:
I was thinking that you could hook up the SD card to the FPGA, you'll only need a few pins for the SPI interface. After that, the existing CPU core on the FPGA could be programmed to read the card. It only needs to be able to download the boot sector, load that in RAM and jump to it.
On your I2C_write_reg routine, I am trying to make it a subroutine I can call and return from. It should still work if I replace all the JMPs with JSRs+RTSs? My cpu is at 40MHz, should I change the delay values? Sorry, it's difficult for my brain to switch back to "software mode". Alot of things going on in there...
ElEctric_EyE wrote:
On your I2C_write_reg routine, I am trying to make it a subroutine I can call and return from. It should still work if I replace all the JMPs with JSRs+RTSs? My cpu is at 40MHz, should I change the delay values? Sorry, it's difficult for my brain to switch back to "software mode". Alot of things going on in there...
You can keep the same delay values for the CS4954. My I2C timings are actually way too fast for standard 100/400 kHz operation. That was a miscalculation on my part, but I noticed the CS4954 handles it without a problem, and actually has quite a fast I2C interface. For 40 MHz, it will be a bit slower, which is probably good. To interface the other I2C devices (I haven't tried that), you may actually have to increase the delay value.