6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 4:38 am

All times are UTC




Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Jul 31, 2024 4:58 pm 
Offline

Joined: Thu Dec 07, 2023 2:30 am
Posts: 18
I asked a couple questions in my thread on the Newbies section, but I figure I might want to split each question out into its own separate thread.

I'm doing early planning for a 16-bit handheld game system using a 65C816 as its CPU. Currently, I'm trying to narrow down what I need to learn to make this happen, but in that I need to narrow down a rough idea of what my hardware will look like.

Since no one really makes a 16-bit VDP and I'm trying to stick to parts in active production, my only options are to either use an FPGA to make a custom one or have a second CPU running the graphics functions.

The current goals are as follows:
  • 16-bit color
  • 320x240 resolution (QVGA)
  • Possible NTSC output (connecting to a TV through Composite or S-Video)
  • 60 frames per second to keep it on par with Game Boy framerates
  • Possibly some kind of SuperFX-esque 3D graphics if the chosen solution is powerful enough for it (65C816 might not be able to do floating points that well for 3D)

The plan is to have it function similarly to a Game Boy or Super Nintendo using tiles/sprites and graphical layers to make the desired image. The main CPU will only be telling the graphics processor which graphic IDs to load from RAM, where to put them and when to move them while the graphics chip will do the actual work of loading the graphics from the cartridge, loading them in to VRAM, modifying them and placing them on screen and drawing to the LCD or video output.

The LCDs I'm looking at so far range from using 16-bit parallel RGB up to 24-bit RGB parallel.

What I'm wondering is if this kind of workload sounds manageable on a second 65C816 or if I will need to go the FPGA route? Ideally, I'm looking for a cheaper FPGA, something that isn't more expensive than the 65C816 I'm looking at for the CPU.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 31, 2024 5:28 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
Let's do some math.

320x240 at 60Hz is about what the C64 could do. So let me start there.

This means you have a pixel frequency around 8 MHz. With a 16bit colour approach, you'd need 2 bytes per pixel, i.e. 16MBytes per second throughput just to display the video (at least in the active areas, inside the borders if you have them). Also, that would be 153,600 bytes.

The fastest way a 65816 could transfer bytes from location A to location B takes 8 cycles. If your VDP should provide the video itself, a 65816 would have to have a speed of 128MHz or thereabouts.

This means the simple approach won't work. What can be done about it?

1) reduce bandwidth requirements: use palette modes that e.g. allow to select 1 out of 16 arbitrary colours from a palette. A single bit per pixel with fore- and background could do this for a two colour palette.
Assuming a 4 bit per pixel approach, the 65816 would still need to run at 32MHz. Some people overclock the 816, but this is still way out of spec.

2) DMA the pixel data from memory into the video output logic, and use the separate 65816 CPU to just modify the video data when it needs changing.
This would still need 16MHz memory clock for an 8 bit DMA. This is easily doable, but to also get the CPU access in, you need really fast RAM and select logic. Or the CPU waits for the off screen areas, like some older micros, the you can use the 816 as video processor (I run it without problems at 17.5MHz even under heat gun or ice spray).
But even so, by using two alternating video banks, giving DMA a 16 bit access, memory needs to run at 8 MHz only, or you run it at 16 MHz and interleave CPU with video access.
But in my opinion, such a solution would already basically require some fast programmable logic like an FPGA. That probably defeats your goal of a single VDC.

3) like 2), but just use the main CPU to modify the memory. This is the route most micros used back in the day. Only their video logic was not programmable but custom chips like the VIC-II.
If using an FPGA, you'd have the advantage that you could program a custom video processor next to the DMA logic, that is then optimized for yhat specific task.

Option 3) is what I am looking into for my Micro-PET family, if I can fit it into the FPGA I have chosen (Xilinx Spartan 6E).

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 31, 2024 5:38 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
P.S.: if the LCD has an addressable memory interface and internal video memory, this could make option 2 a viable choice.

You'd still have to calculate how fast you'd want to eg. clear the screen, or copy a new screen to the LCD - does it have multiple video pages so you can do double buffering? With 150+k of memory to copy and 8 cycles minimum per byte for a 65816 (that calculation neglect if you have to adapt memory addresses in between / assumes some auto increment), this will still need 1.2 seconds. You'd probably be faster with programmable logic still.

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 31, 2024 5:41 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8387
Location: Midwestern USA
fachat wrote:
Let's do some math...The fastest way a 65816 could transfer bytes from location A to location B takes 8 cycles.  If your VDP should provide the video itself, a 65816 would have to have a speed of 128MHz or thereabouts...

Minor note: the MVx instructions can copy a byte in 7 clock cycles, so it would be a little faster than what you computed. At 20 MHz, the 816 could theoretically copy 2.8 MB/second, assuming no interrupts.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 31, 2024 5:53 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
BigDumbDinosaur wrote:
fachat wrote:
Let's do some math...The fastest way a 65816 could transfer bytes from location A to location B takes 8 cycles.  If your VDP should provide the video itself, a 65816 would have to have a speed of 128MHz or thereabouts...

Minor note: the MVx instructions can copy a byte in 7 clock cycles, so it would be a little faster than what you computed. At 20 MHz, the 816 could theoretically copy 2.8 MB/second, assuming no interrupts.


I was assuming from memory to a fixed IO port (basically the pixel out register), which rules out MVN/MVP. Except of course if the IO port is designed to take the video output on a whole page...
There are so many options :-)

Anyway, even with MVx, the 816 still needs to be at about 100+MHz...

A little rant here - I was quite disappointed when I saw the MVx opcodes take 7 bytes per cycle. The DMA engine I did for my self-built machine takes only two ... (but admittedly it has 20 bit adders/incrementors...)

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2024 1:16 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8387
Location: Midwestern USA
fachat wrote:
BigDumbDinosaur wrote:
fachat wrote:
Let's do some math...The fastest way a 65816 could transfer bytes from location A to location B takes 8 cycles.  If your VDP should provide the video itself, a 65816 would have to have a speed of 128MHz or thereabouts...

Minor note: the MVx instructions can copy a byte in 7 clock cycles, so it would be a little faster than what you computed. At 20 MHz, the 816 could theoretically copy 2.8 MB/second, assuming no interrupts.

I was assuming from memory to a fixed IO port (basically the pixel out register), which rules out MVN/MVP. Except of course if the IO port is designed to take the video output on a whole page...
There are so many options :-)

In my quest to ramp up SCSI access speed of my POC unit, I had conjured a scenario in which the 65C816 would use MVN to read/write the host adapter’s DMA port. That port is mirrored in a page, so I figured I’d configure the 816 to copy 256 bytes to/from a page of RAM, and repeat until all bytes had been copied.  What stopped that idea was there was no easy way to handshake the DMA requests being generated by the host adapter.  Without such handshaking, it would possible for the 816 and the host adapter to get out of sync during a slow-peripheral access and start processing garbage.

Another scenario I have considered is making a pseudo-DMA controller from a second 816, which would be able to handshake with the host adapter.  The DMA request (DREQ) output of the host adapter would be wired to the DMA controller’s clock generator so the 816 could be stopped during the transfer if DREQ is not asserted. Hence the MVN instruction copying bytes to/from the host adapter would stall if the latter wasn’t ready, but would run at full speed if the SCSI peripheral is able to keep up.

While the above would work, it means working out some tricky timing, plus devising a protocol for sharing the buses between the main 816 and the DMA 816.  I decided if I was going to go through that much hoop-jumping, it would be better to direct the effort into fashioning a true DMA controller from a CPLD or FPGA that could read on one cycle and write on the next.

Quote:
Anyway, even with MVx, the 816 still needs to be at about 100+MHz...

Ah, yes...that would be a problem, eh?  :roll:

Quote:
A little rant here - I was quite disappointed when I saw the MVx opcodes take 7 bytes per cycle.  The DMA engine I did for my self-built machine takes only two ... (but admittedly it has 20 bit adders/incrementors...)

The other thing with MVx is the source and destination banks are operands to the instruction, which means self-modifying code is needed to fashion a general-purpose copy/fill routine out of MVN and MVP.  Plus while inter-bank copying is possible, it isn’t possible to cross bank boundaries during the copy operation.

While MVx is handy for initializing and copying data structures that don’t span banks, it is not flexible enough to create a general-purpose blitter.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2024 7:32 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
So, coming back to the original question.

As I don't know which interface is provided by the LCD (I think the other thread mentions SPI and 8bit parallel?), I'd probably suggest another route: use a standard VGA/HDMI output, which gives you:

1) better options to attach an external screen
2) more options on existing video solutions, like the Commander X16's "vera" which seems to be available separately, or the Xilinx FPGA solution I did in my Micro-PET.

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 14, 2024 4:42 pm 
Offline

Joined: Thu Dec 07, 2023 2:30 am
Posts: 18
fachat wrote:
Let's do some math.

320x240 at 60Hz is about what the C64 could do. So let me start there.

This means you have a pixel frequency around 8 MHz. With a 16bit colour approach, you'd need 2 bytes per pixel, i.e. 16MBytes per second throughput just to display the video (at least in the active areas, inside the borders if you have them). Also, that would be 153,600 bytes.

The fastest way a 65816 could transfer bytes from location A to location B takes 8 cycles. If your VDP should provide the video itself, a 65816 would have to have a speed of 128MHz or thereabouts.

This means the simple approach won't work. What can be done about it?

1) reduce bandwidth requirements: use palette modes that e.g. allow to select 1 out of 16 arbitrary colours from a palette. A single bit per pixel with fore- and background could do this for a two colour palette.
Assuming a 4 bit per pixel approach, the 65816 would still need to run at 32MHz. Some people overclock the 816, but this is still way out of spec.

2) DMA the pixel data from memory into the video output logic, and use the separate 65816 CPU to just modify the video data when it needs changing.
This would still need 16MHz memory clock for an 8 bit DMA. This is easily doable, but to also get the CPU access in, you need really fast RAM and select logic. Or the CPU waits for the off screen areas, like some older micros, the you can use the 816 as video processor (I run it without problems at 17.5MHz even under heat gun or ice spray).
But even so, by using two alternating video banks, giving DMA a 16 bit access, memory needs to run at 8 MHz only, or you run it at 16 MHz and interleave CPU with video access.
But in my opinion, such a solution would already basically require some fast programmable logic like an FPGA. That probably defeats your goal of a single VDC.

3) like 2), but just use the main CPU to modify the memory. This is the route most micros used back in the day. Only their video logic was not programmable but custom chips like the VIC-II.
If using an FPGA, you'd have the advantage that you could program a custom video processor next to the DMA logic, that is then optimized for yhat specific task.

Option 3) is what I am looking into for my Micro-PET family, if I can fit it into the FPGA I have chosen (Xilinx Spartan 6E).

André

DMA, not using the full color palette and tile memory were something I was already planning on doing to keep drawing instructions simple. I wanted to make developing for this be as easy as the Game Boy, but having a 16-bit CPU and full color LCD. A lot of my plans have it sitting as kind of a cross between a Game Boy and a Super Nintendo.


fachat wrote:
So, coming back to the original question.

As I don't know which interface is provided by the LCD (I think the other thread mentions SPI and 8bit parallel?), I'd probably suggest another route: use a standard VGA/HDMI output, which gives you:

1) better options to attach an external screen
2) more options on existing video solutions, like the Commander X16's "vera" which seems to be available separately, or the Xilinx FPGA solution I did in my Micro-PET.

André

If it uses an external screen, then it's not really a handheld which is the design goal. That said, I'd like to have the option for hooking up an external display, so I might try to work in something with S-Video or Composite output.

My original impression was that SPI was superior, but talking with people on there showed that the LCD controller (which nearly every LCD I was looking at as an option had) might not be able to handle that much data through SPI.

There are a few open source video FPGAs I was looking at using as a base to simplify down to what I need (since most of them are overkill for what I need.) I'm a little early in the research stages for this to decide on one as I would still need to learn HDL to be able to comprehend those.

fachat wrote:
P.S.: if the LCD has an addressable memory interface and internal video memory, this could make option 2 a viable choice.

You'd still have to calculate how fast you'd want to eg. clear the screen, or copy a new screen to the LCD - does it have multiple video pages so you can do double buffering? With 150+k of memory to copy and 8 cycles minimum per byte for a 65816 (that calculation neglect if you have to adapt memory addresses in between / assumes some auto increment), this will still need 1.2 seconds. You'd probably be faster with programmable logic still.

I haven't seen anything on the LCDs I was considering that would indicate that they had onboard memory.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 14, 2024 8:22 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1466
Location: Scotland
Ardis wrote:
I haven't seen anything on the LCDs I was considering that would indicate that they had onboard memory.


They do exist - panels with controller ICs that hold the graphics data and do the hard work of refresh, etc. for you.

I did a project some years back to connect an LCD to a Raspberry Pi via the SPI interface. The display had a controller chip with graphics memory on-board so that you could send it a command to plot a pixel and that was that - the display would remain static with the data you sent so you did not have to continually refresh it from the host.

The display was 220x176 and colour depth was 16bpp (5:6:5, r:g:b)

It worked very well, but what I found was that sending commands to the display to draw lines, points, etc. was very slow as latency to start an SPI transmission was (relatively) high so I resorted to block transfer a soft copy of the display held in the Pi's RAM via SPI running at 48Mhz. This worked very well, but did need 78KB of RAM and that 48Mhz SPI clock...

My code did the drawing, printing, etc. to the RAM buffer then you called an "update" function which did a bulk transfer of the RAM copy to the display's RAM.

This particular display had the ILI9225C controller which basically does everything - just send it SPI commands and off you go.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 14, 2024 10:41 pm 
Offline

Joined: Thu Dec 07, 2023 2:30 am
Posts: 18
drogon wrote:
Ardis wrote:
I haven't seen anything on the LCDs I was considering that would indicate that they had onboard memory.


They do exist - panels with controller ICs that hold the graphics data and do the hard work of refresh, etc. for you.

I did a project some years back to connect an LCD to a Raspberry Pi via the SPI interface. The display had a controller chip with graphics memory on-board so that you could send it a command to plot a pixel and that was that - the display would remain static with the data you sent so you did not have to continually refresh it from the host.

The display was 220x176 and colour depth was 16bpp (5:6:5, r:g:b)

It worked very well, but what I found was that sending commands to the display to draw lines, points, etc. was very slow as latency to start an SPI transmission was (relatively) high so I resorted to block transfer a soft copy of the display held in the Pi's RAM via SPI running at 48Mhz. This worked very well, but did need 78KB of RAM and that 48Mhz SPI clock...

My code did the drawing, printing, etc. to the RAM buffer then you called an "update" function which did a bulk transfer of the RAM copy to the display's RAM.

This particular display had the ILI9225C controller which basically does everything - just send it SPI commands and off you go.

-Gordon


I didn't say the ones I was looking at didn't have onboard memory. I just didn't see anything on them that answered whether they did or not. It's possible every LCD I was looking at does, but it just wasn't written where I thought it would be.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 14, 2024 11:47 pm 
Offline

Joined: Tue Jul 30, 2024 6:20 pm
Posts: 70
You may want to consider my current obsession, Sipeed Nano boards with Chinese FPGAs.

For $20, you can have Nano9K, a board with HDMI, LCD connector, a boatload of IO, MicroSD and USB-serial. The FPGA fits a 65c02 core with up to 52KB ram and a UART with 85% left over for your video or anything else. Or connect it to your hard 6502 system and just use it for IO.

There is also a $15 Nano4K with a hard ARM and a smaller FPGA, but also HDMI, plenty big for this kind of stuff.

The Chinese toolchain is very usable, and yosys/apycula opensource tools work extremely well with these FPGAs. It's pretty much plug in a USB cable and build circuits.

The price is pretty much what Mouser charged me for shipping alone...


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 23, 2024 1:06 am 
Offline

Joined: Thu Dec 07, 2023 2:30 am
Posts: 18
enso1 wrote:
You may want to consider my current obsession, Sipeed Nano boards with Chinese FPGAs.

For $20, you can have Nano9K, a board with HDMI, LCD connector, a boatload of IO, MicroSD and USB-serial. The FPGA fits a 65c02 core with up to 52KB ram and a UART with 85% left over for your video or anything else. Or connect it to your hard 6502 system and just use it for IO.

There is also a $15 Nano4K with a hard ARM and a smaller FPGA, but also HDMI, plenty big for this kind of stuff.

The Chinese toolchain is very usable, and yosys/apycula opensource tools work extremely well with these FPGAs. It's pretty much plug in a USB cable and build circuits.

The price is pretty much what Mouser charged me for shipping alone...

It doesn't sound like what I'm looking for. One of those types of boards take up a lot of space, something that's kind of at a premium in a handheld device. I also don't intend to have any USB interfaces (except possibly a charging port) on the completed device.

It's also a bit more than my FPGA budget. I'm trying to see if I can run the graphics on a $10 FPGA. The more money I save elsewhere on the system, the more I can put into a better LCD. My prototype (minus the PCB and shell) needs to come in under $100 in parts.

I was already considering a Lattice iCE40HX1K-VQ100. Even this might, honestly, be overkill, but it is in a QFP package, which means I don't need to buy additional tools to solder it.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 23, 2024 5:10 pm 
Offline

Joined: Tue Jul 30, 2024 6:20 pm
Posts: 70
Everyone has their own ideas, so I get that you want to do something else. But what you just posted makes little sense, from engineering or financial point of view.

If budget, size and power-consumption (as in hand-held device) are an issue, I cannot see why anyone would not just use a stock Nano 9K device:

Size: 35mm x 65mm, smaller than 1/2 credit card, 5mm thick
Power consumption: 35 mW!
LCD interface: yes, an working physical interface with sample code ,
Availability: Amazon Prime next day
Tons of IO, SD-card,RAM, flash, HDMI, and other things.

You can even embed a 65c02 core and still have 85% of the logic left over for any kind of fancy graphics or sound engine, and actually use a 65c02 as an on-board video processor, using on-board 52K block-ram as VRAM. Or put 5 65c02s in there if you want.

There is the $15 Nano4K and the lowly $10 Nano1K - very small but has an LCD interface. What do you think it will cost you for just the LCD connector? $20 is a an amazing bargain, considering it matches the design criteria. Personally, I'd go for an off-the shelf $31 Nano20K kit complete with a 4.3" LCD (from Amazon) and a speaker, and be done!
https://www.amazon.com/youyeetoo-Sipeed-Development-RISC-V-Embedded/dp/B0C8N2DMJY

If $10 is a serious consideration at prototype stage, you should probably get a job or maybe find a cheaper hobby, although I can't think of anything where $10 is a lot of money -- you can't buy decent knitting needles for that much. Getting a raw $10 FPGA working on a custom board will cost a lot, lot more.

I wish you much luck in your project -- sounds like you will need more than the usual amount.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 23, 2024 5:24 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
(That's rather offensive, enso, to comment on someone's wishes to keep the cost of a project down in such terms. You show yourself in a bad light, and not for the first time, with uncharitable comments.)


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 23, 2024 6:17 pm 
Offline

Joined: Tue Jul 30, 2024 6:20 pm
Posts: 70
Really? I offered a viable way to keep the hardware costs way down! I am not a wealthy person, but $10 at prototyping stage should not be an issue. Getting something working is well worth that.

I think our society has gone way off the rails with 'positive talk only' mentality. If I am doing something stupid, I hope my friends -- and randos on the internet -- will point it out to me before I waste too much time.


Last edited by enso1 on Fri Aug 23, 2024 6:20 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: