A concept for sprite hardware
Re: A concept for sprite hardware
BigDumbDinosaur wrote:
As for the bank latching, that is a nuisance—timing can be critical. It also complicates aspects of the glue logic that determine where things such as ROM and I/O show up in the memory map. Although it can be done with discrete logic, better to use programmable logic and ease the timing problems.
Re: A concept for sprite hardware
catelyn wrote:
Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!
Re: A concept for sprite hardware
gfoot wrote:
catelyn wrote:
Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!
Re: A concept for sprite hardware
I have Linux too and this way worked for me to actually program ATF1504AS via ATF1504AS, maybe my rought notes may help somehow 
* install VirtualBox + virtualbox-guest-additions-iso
* Create Win XP
* Follow this (or anything else) https://eprebys.faculty.ucdavis.edu/202 ... -other-vm/ , here is ISO https://archive.org/details/WinXPProSP3x86 and the key is M6TF9-8XQ2M-YQK9F-7TBB2-XGG88
* disable net (`Network`)
* Set sharing of some directory (`Shared folders`), with Automount and Make Permanent
* push in USB `Microchip ATDH1150USB` and enable it in `USB`
* Connect into `ISP` connector `ATF1504AS` powered with `5V`
* Download and install winCUPL and ATMISP v6.7 (for Windows 2000 and XP) or ATMISP v7.3 (for Windows 7, 8 and 10) https://www.microchip.com/en-us/product ... -resources
* Serial Number for WinCUPL: 60008009
* Create `SomeFile.PLD` and share it on Shared folders
* Open it in winCUPL and use `Run / Device Dependent Compile` - this create `SomeFile.JED`
* in `ATMISP`
* Options / Scan USB cable (it should be able to find the ATDH1150USB an right port from USB)
* Edit / Add New Device (`After 0`), Device `ATF1504AS`, JTAG `Program/Verify`, Jedec file `SomeFile.JED`
* Run
* At the time I should to have already connected `ATF1504AS` powered by `5V` and it should program the CPLD.
* install VirtualBox + virtualbox-guest-additions-iso
* Create Win XP
* Follow this (or anything else) https://eprebys.faculty.ucdavis.edu/202 ... -other-vm/ , here is ISO https://archive.org/details/WinXPProSP3x86 and the key is M6TF9-8XQ2M-YQK9F-7TBB2-XGG88
* disable net (`Network`)
* Set sharing of some directory (`Shared folders`), with Automount and Make Permanent
* push in USB `Microchip ATDH1150USB` and enable it in `USB`
* Connect into `ISP` connector `ATF1504AS` powered with `5V`
* Download and install winCUPL and ATMISP v6.7 (for Windows 2000 and XP) or ATMISP v7.3 (for Windows 7, 8 and 10) https://www.microchip.com/en-us/product ... -resources
* Serial Number for WinCUPL: 60008009
* Create `SomeFile.PLD` and share it on Shared folders
* Open it in winCUPL and use `Run / Device Dependent Compile` - this create `SomeFile.JED`
* in `ATMISP`
* Options / Scan USB cable (it should be able to find the ATDH1150USB an right port from USB)
* Edit / Add New Device (`After 0`), Device `ATF1504AS`, JTAG `Program/Verify`, Jedec file `SomeFile.JED`
* Run
* At the time I should to have already connected `ATF1504AS` powered by `5V` and it should program the CPLD.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A concept for sprite hardware
catelyn wrote:
Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!
Does anyone still sell PALs?
Am easy route to getting comfortable with programmable logic is, as described, to get WinCUPL set up on some Windows thingie, write some simple designs and compile/simulate to see what happens. WinCUPL has some problems with the GUI that binds it all together, but the individual modules, which are mostly superannuated MS-DOS programs, can be separately run to do the compiling and simulation. The WinCUPL source editor is adequate, but you may want to use a more advanced editor.
As for PLD hardware, assuming you want to stay with 5 volt parts, you should look at Atmel (Microchip) products. They continue to produce GALs, the 16V8 and 22V10, as well as proprietary versions of those, e.g., the ATF750. Atmel’s 22V10C is available in speeds down to 7.5 nanoseconds (pin-to-pin). Their ATF750C is also available in a 7.5ns version, it being a souped-up version of the 22V10C.
Ultimately, a 65C816 application as you are envisioning will be better-served, in my opinion, by a CPLD. Atmel offers a family of 5 volt CPLDs in the ATF15xx series, of which the two most capable devices are the ATF1504AS and ATF1508AS. Both are available in PLCC or QFP packages, and with speeds down to 7.5 ns. The QFP packages offer a lot of I/O pins, in less space, of course.
Unlike GALs, CPLDs have buried logic that can be used for implementing registers, transparent latches, counters and other state machine applications without using physical pins as nodes. Aside from greatly increasing design flexibility, use of buried logic avoids the propagation delay associated with physical pin nodes, as well as making more pins available for external connections.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: A concept for sprite hardware
BigDumbDinosaur wrote:
Does anyone still sell PALs? 
BigDumbDinosaur wrote:
Am easy route to getting comfortable with programmable logic is, as described, to get WinCUPL set up on some Windows thingie, write some simple designs and compile/simulate to see what happens. WinCUPL has some problems with the GUI that binds it all together, but the individual modules, which are mostly superannuated MS-DOS programs, can be separately run to do the compiling and simulation. The WinCUPL source editor is adequate, but you may want to use a more advanced editor.
BigDumbDinosaur wrote:
As for PLD hardware, assuming you want to stay with 5 volt parts, you should look at Atmel (Microchip) products. They continue to produce GALs, the 16V8 and 22V10, as well as proprietary versions of those, e.g., the ATF750. Atmel’s 22V10C is available in speeds down to 7.5 nanoseconds (pin-to-pin). Their ATF750C is also available in a 7.5ns version, it being a souped-up version of the 22V10C.
BigDumbDinosaur wrote:
Ultimately, a 65C816 application as you are envisioning will be better-served, in my opinion, by a CPLD. Atmel offers a family of 5 volt CPLDs in the ATF15xx series, of which the two most capable devices are the ATF1504AS and ATF1508AS. Both are available in PLCC or QFP packages, and with speeds down to 7.5 ns. The QFP packages offer a lot of I/O pins, in less space, of course.
BigDumbDinosaur wrote:
Unlike GALs, CPLDs have buried logic that can be used for implementing registers, transparent latches, counters and other state machine applications without using physical pins as nodes. Aside from greatly increasing design flexibility, use of buried logic avoids the propagation delay associated with physical pin nodes, as well as making more pins available for external connections.
Re: A concept for sprite hardware
Fwiw, there is always software sprites as a backup. I created a software renderer in my emulator (since I don't have the actual graphics hardware yet) for a memory mapped 256x240px tilebased 1-bit graphics display and with massive loop unrolling got about <1800 cycles for a free moving 8x16 sprite with a bitmask (so in effect you get three colors, white, black, and background). I am sure it can be optimized further but depending on the clock speed of you hardware and how many sprites you really need, this could be considered as well (or, if you get only a few hardware sprites up and running, can be mixed with that approach).
Edit: The idea behind the software renderer is to generate 8 different versions of the sprite, corresponding to x bits of horisontal shift based on (X-position & 7). Then replace the tiles affected by the sprite with identical tile data in a new set of tiles, then draw the shifted sprite on top of that tile data.
Edit2: I am leaning towards using purely this software approach now, as it will reduce chip count a lot, allowing for a potential tighter board layout with higher CPU speeds. I am thinking NPCs can have deterministic behavior and there i could probably cheat a lot in code, allowing for total freedom of movement only for the player. Otherwise I would probably have considered George Foot's 1d-approach, only having hardware for the X-position. My thought in that regard was to only store an 8-bit sprite data and an 8-bit mask in the two ports of a 65C22 and be done with it.
Edit: The idea behind the software renderer is to generate 8 different versions of the sprite, corresponding to x bits of horisontal shift based on (X-position & 7). Then replace the tiles affected by the sprite with identical tile data in a new set of tiles, then draw the shifted sprite on top of that tile data.
Edit2: I am leaning towards using purely this software approach now, as it will reduce chip count a lot, allowing for a potential tighter board layout with higher CPU speeds. I am thinking NPCs can have deterministic behavior and there i could probably cheat a lot in code, allowing for total freedom of movement only for the player. Otherwise I would probably have considered George Foot's 1d-approach, only having hardware for the X-position. My thought in that regard was to only store an 8-bit sprite data and an 8-bit mask in the two ports of a 65C22 and be done with it.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: A concept for sprite hardware
Johan, you're using 2 bits for 3 "colors", you could make it 4 "colors" with zero additional cost by using XOR instead of OR to compose your image after the AND mask:
With the normal inclusive OR, the fourth combination is just a redundant "On".
Code: Select all
AND XOR Effect
0 0 Off
0 1 On
1 0 Transparent
1 1 Inverted
Re: A concept for sprite hardware
catelyn wrote:
gfoot wrote:
catelyn wrote:
Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!
Last edited by JohanFr on Fri Aug 22, 2025 6:16 am, edited 1 time in total.
Re: A concept for sprite hardware
White Flame wrote:
Johan, you're using 2 bits for 3 "colors", you could make it 4 "colors" with zero additional cost by using XOR instead of OR to compose your image after the AND mask:
With the normal inclusive OR, the fourth combination is just a redundant "On".
Code: Select all
AND XOR Effect
0 0 Off
0 1 On
1 0 Transparent
1 1 Inverted
Code: Select all
lda (tileptr),y
and (spritemask),y
ora (sprite),y
sta (tileptr),y
Re: A concept for sprite hardware
JohanFr wrote:
My thought in that regard was to only store an 8-bit sprite data and an 8-bit mask in the two ports of a 65C22 and be done with it.
However, while doing this, I recalled back in the day of how the mouse cursor was drawn in software. There were actually two sets of pixel data for the mouse, one was used for transparency and another was used to set/unset the pixel (black, white, or inverted!) As I recall this all required just a handful of very basic binary operators. I don't recall the specifics, I know XOR was one of them, or the order any longer, but I do recall it being pretty easy to do.
This further got me thinking about it (along with gfoot's note about the 1D sprites, and figuring alphas) that perhaps it might be easier to have the transparency broken out into a separate 1bpp map. In this way 8 pixels of transparency could be loaded in a single byte read from RAM; some clever application of multiplexers could then probably used to select the correct color (memory location, whatever) for the sprite.
Anyhow, pay no heed to the ramblings of the mad man in the corner quietly laughing to himself about bit depths and and data transfer rates. @_@
Re: A concept for sprite hardware
For a classic "parallel" sprite engine, I think it is just a matter of scaling up the hardware for as many sprites you support on a scanline at the same time. My guess is the PPU used the raster time available to read from memory the sprite metadata (bitmappointer, X, Y, flip, palette etc) , storing it in internal registers (one set per sprite). So it was "just" a matter of looping through the memory locations of sprites (which there could be a lot more of than was actually possible to render on screen) and filling the next set of registers as soon as a valid sprite was detected. Then on the actual scanline, all the sprite engines would work in parallel, feeding their outputs into a multiplexer that takes the current background data into account, then finally squeezes out a 2-bit + paletteselection value. Each sprite engine could, for example, feed it's two-bit pixel combination into an OR-gate, the output=1 meaning the sprite has data. An 8-to-3 priority encoder could then take the 8 bits (or technically 7 + background) to choose which one is to be rendered. Then THAT output could feed the select lines bunch of 74153s to get the actual palette data to be rendered for that pixel.
This is all pure speculation of course, but I think on an ASIC/FPGA this would be the most feasible thing to do. Now with discrete components, the chip count quickly becomes enormous, so the OPs approach might work better here, rendering sprite data into a frame buffer.
Edit: I realize the '153 has 2-bit select lines so something different would need to be used to support 3-bit selects. So for a hobbyist, perhaps 3 sprites + background would reduce chip complexity a bit.
Edit2: Sorry I got a bit carried away, this topic fascinates me
This is all pure speculation of course, but I think on an ASIC/FPGA this would be the most feasible thing to do. Now with discrete components, the chip count quickly becomes enormous, so the OPs approach might work better here, rendering sprite data into a frame buffer.
Edit: I realize the '153 has 2-bit select lines so something different would need to be used to support 3-bit selects. So for a hobbyist, perhaps 3 sprites + background would reduce chip complexity a bit.
Edit2: Sorry I got a bit carried away, this topic fascinates me
Re: A concept for sprite hardware
JohanFr wrote:
Fwiw, there is always software sprites as a backup.
The display hardware has a concept of transparent pixels, so in the sprite you make all pixels that are not part of the sprite the colour of the transparent pixel and when the underlying library plots the sprite the magic happens at that level.
Plotting the sprites is the tricky part.
I keep a list of sprites and to plot a sprite, I 2D copy the underlying screen rectangle into a save area, then plot the sprite into that area.
At screen update time I restore the screen in the same order.
This generally works well - there is the drawback of speed vs. number of sprites but when testing, I was able to animate 64 64x64 pixel (32bpp) sprites on a Raspberry Pi v1. Tricky things are like scrolling the screen (also done in software). It's quite odd to scroll and have the background picture move up, but the sprites stay in the same place...
And each sprite is actually (potentially) a list of spites, so you can do sub-sprite animations - like space invaders which have 2 steps, and so on.
Harder things - detecting a sprite collision - needed when e.g. firing a missile at a space invader - you can use a few different algorithms - sprite centres being within a certain distance - works relatively well, or "perfect pixel" - works very well, but also very slow. Or do it in your own code.
In these enlightened days you just get the GPU to do it all for you..
This is a quick demo: https://www.youtube.com/watch?v=K2lZA24zPDk
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: A concept for sprite hardware
JohanFr wrote:
Absolutely, but it would not render the sprite the way i want it: AND-mask 0 meaning whatever the sprite bit value is takes priority, since it clears the background bit, i.e:
Code: Select all
lda (tileptr),y
and (spritemask),y
ora (sprite),y
sta (tileptr),y
Mask 0 works exactly the same as your code (since ORA and EOR against 0 both yield the same result), as well as mask 1, sprite 0 still being normal background transparency.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: A concept for sprite hardware
Yuri wrote:
I have been trying to puzzle together the details of how the SNES PPU works, and was considering how it managed to shuttle all the pixel data for the various sprites; seems like magic to me how it manages to pull in the color data of all the potential sprites, along with the background tile colors, and then work out which one is not only on top (you can reorder them based on an priority attribute), but which one was transparent, all on what seems to be a 16 bit data bus with a clock that doesn't run any faster than the pixel clock. All my ideas for how I might do such a thing in an FPGA would require me to boost the clock up to some crazy speed, even for 320x240 graphics.
Palette storage is internal to the chip, so color lookup doesn't need to take video ram bus cycles, rather they'd be part of the fixed pipeline combining all the layers & windows into the final output pixel. Sprite priority would also happen there, since that is about sprite vs tile layer priority; sprite to sprite priority is fixed in almost all sprite hardware, and is handled implicitly with the painter's algo in drawing the line buffer. I presume the sprite line buffer retains some metadata bits per pixel for priority stuff like that, which again would be overwritten in painter's algo style to only hold the topmost pixel info.
And just found a doc which breaks a lot of the timing down, in the "DETAILED RENDERING TIMING" section near the bottom: https://github.com/naffnuff/SnesEmulato ... ming%20Doc
The NES has this wonderful timing diagram, wish the SNES had the same.