A concept for sprite hardware

catelyn · Post by **catelyn** » Sun Aug 17, 2025 12:04 pm

BigDumbDinosaur wrote:

As for the bank latching, that is a nuisance—timing can be critical. It also complicates aspects of the glue logic that determine where things such as ROM and I/O show up in the memory map. Although it can be done with discrete logic, better to use programmable logic and ease the timing problems.

Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!

gfoot · Post by **gfoot** » Sun Aug 17, 2025 2:46 pm

catelyn wrote:

Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!

There's lots of good information here in the programmable logic section, yes. If you just want to get a taste for it, you could stick with software initially, install WinCupl or something similar and experiment with the way you define the logic - using the supplied software simulator to check your logic is doing what you wanted it to. This will introduce you to the syntax you program the chips in, the kinds of capabilities various devices have (input pins, I/O pins, buried logic etc) and help you figure out whether to go ahead and get the hardware, and if so, which hardware you want.

catelyn · Post by **catelyn** » Sun Aug 17, 2025 2:58 pm

gfoot wrote:

catelyn wrote:

Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!

There's lots of good information here in the programmable logic section, yes. If you just want to get a taste for it, you could stick with software initially, install WinCupl or something similar and experiment with the way you define the logic - using the supplied software simulator to check your logic is doing what you wanted it to. This will introduce you to the syntax you program the chips in, the kinds of capabilities various devices have (input pins, I/O pins, buried logic etc) and help you figure out whether to go ahead and get the hardware, and if so, which hardware you want.

I've tried to install WinCupl previously, but I run Linux rather than Windows and getting something that old working properly under wine is a slight pain. It's doable for sure but I haven't really had the necessity to do so yet.

gilhad · Post by **gilhad** » Sun Aug 17, 2025 4:03 pm

I have Linux too and this way worked for me to actually program ATF1504AS via ATF1504AS, maybe my rought notes may help somehow

* install VirtualBox + virtualbox-guest-additions-iso
* Create Win XP
* Follow this (or anything else) https://eprebys.faculty.ucdavis.edu/202 ... -other-vm/ , here is ISO https://archive.org/details/WinXPProSP3x86 and the key is M6TF9-8XQ2M-YQK9F-7TBB2-XGG88
* disable net (`Network`)
* Set sharing of some directory (`Shared folders`), with Automount and Make Permanent
* push in USB `Microchip ATDH1150USB` and enable it in `USB`
* Connect into `ISP` connector `ATF1504AS` powered with `5V`

* Download and install winCUPL and ATMISP v6.7 (for Windows 2000 and XP) or ATMISP v7.3 (for Windows 7, 8 and 10) https://www.microchip.com/en-us/product ... -resources
* Serial Number for WinCUPL: 60008009

* Create `SomeFile.PLD` and share it on Shared folders
* Open it in winCUPL and use `Run / Device Dependent Compile` - this create `SomeFile.JED`
* in `ATMISP`
* Options / Scan USB cable (it should be able to find the ATDH1150USB an right port from USB)
* Edit / Add New Device (`After 0`), Device `ATF1504AS`, JTAG `Program/Verify`, Jedec file `SomeFile.JED`
* Run

* At the time I should to have already connected `ATF1504AS` powered by `5V` and it should program the CPLD.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Aug 17, 2025 9:39 pm

catelyn wrote:

Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!

Does anyone still sell PALs?

Am easy route to getting comfortable with programmable logic is, as described, to get WinCUPL set up on some Windows thingie, write some simple designs and compile/simulate to see what happens. WinCUPL has some problems with the GUI that binds it all together, but the individual modules, which are mostly superannuated MS-DOS programs, can be separately run to do the compiling and simulation. The WinCUPL source editor is adequate, but you may want to use a more advanced editor.

As for PLD hardware, assuming you want to stay with 5 volt parts, you should look at Atmel (Microchip) products. They continue to produce GALs, the 16V8 and 22V10, as well as proprietary versions of those, e.g., the ATF750. Atmel’s 22V10C is available in speeds down to 7.5 nanoseconds (pin-to-pin). Their ATF750C is also available in a 7.5ns version, it being a souped-up version of the 22V10C.

Ultimately, a 65C816 application as you are envisioning will be better-served, in my opinion, by a CPLD. Atmel offers a family of 5 volt CPLDs in the ATF15xx series, of which the two most capable devices are the ATF1504AS and ATF1508AS. Both are available in PLCC or QFP packages, and with speeds down to 7.5 ns. The QFP packages offer a lot of I/O pins, in less space, of course.

Unlike GALs, CPLDs have buried logic that can be used for implementing registers, transparent latches, counters and other state machine applications without using physical pins as nodes. Aside from greatly increasing design flexibility, use of buried logic avoids the propagation delay associated with physical pin nodes, as well as making more pins available for external connections.

catelyn · Post by **catelyn** » Sun Aug 17, 2025 9:52 pm

BigDumbDinosaur wrote:

Does anyone still sell PALs?

I was using it as a catch-all term, this isn't my area of expertise, tho I wouldn't be too surprised if there is some still being sold as new-old stock or smth?

BigDumbDinosaur wrote:

Am easy route to getting comfortable with programmable logic is, as described, to get WinCUPL set up on some Windows thingie, write some simple designs and compile/simulate to see what happens. WinCUPL has some problems with the GUI that binds it all together, but the individual modules, which are mostly superannuated MS-DOS programs, can be separately run to do the compiling and simulation. The WinCUPL source editor is adequate, but you may want to use a more advanced editor.

I'll definitely give that another shot at some point soon, I'm sure I can get them running eventually either through wine or some other means.

BigDumbDinosaur wrote:

As for PLD hardware, assuming you want to stay with 5 volt parts, you should look at Atmel (Microchip) products. They continue to produce GALs, the 16V8 and 22V10, as well as proprietary versions of those, e.g., the ATF750. Atmel’s 22V10C is available in speeds down to 7.5 nanoseconds (pin-to-pin). Their ATF750C is also available in a 7.5ns version, it being a souped-up version of the 22V10C.

I was somewhat looking at those previously, as they seem fairly well set up for simple glue logic, which is of course their design goal.

BigDumbDinosaur wrote:

Ultimately, a 65C816 application as you are envisioning will be better-served, in my opinion, by a CPLD. Atmel offers a family of 5 volt CPLDs in the ATF15xx series, of which the two most capable devices are the ATF1504AS and ATF1508AS. Both are available in PLCC or QFP packages, and with speeds down to 7.5 ns. The QFP packages offer a lot of I/O pins, in less space, of course.

Yeah, I think I might be able to get a CPLD to be used for like, the sprite reader and the sprite renderer, either as separate modules or together, haven't looked into it a lot but that would simplify the timing significantly I immagine.

BigDumbDinosaur wrote:

Unlike GALs, CPLDs have buried logic that can be used for implementing registers, transparent latches, counters and other state machine applications without using physical pins as nodes. Aside from greatly increasing design flexibility, use of buried logic avoids the propagation delay associated with physical pin nodes, as well as making more pins available for external connections.

Having the counter be internal would help with a lot of logic yea, definitely worth figuring out, I *believe* Digital can possibly synthesize to CPLDs but I haven't gotten that working currently, and "programming" them manually might actually be easier (though hardware synthesis is a somewhat different beast from imperative programming, of course)

JohanFr · Post by **JohanFr** » Thu Aug 21, 2025 1:40 pm

Fwiw, there is always software sprites as a backup. I created a software renderer in my emulator (since I don't have the actual graphics hardware yet) for a memory mapped 256x240px tilebased 1-bit graphics display and with massive loop unrolling got about <1800 cycles for a free moving 8x16 sprite with a bitmask (so in effect you get three colors, white, black, and background). I am sure it can be optimized further but depending on the clock speed of you hardware and how many sprites you really need, this could be considered as well (or, if you get only a few hardware sprites up and running, can be mixed with that approach).

Edit: The idea behind the software renderer is to generate 8 different versions of the sprite, corresponding to x bits of horisontal shift based on (X-position & 7). Then replace the tiles affected by the sprite with identical tile data in a new set of tiles, then draw the shifted sprite on top of that tile data.

Edit2: I am leaning towards using purely this software approach now, as it will reduce chip count a lot, allowing for a potential tighter board layout with higher CPU speeds. I am thinking NPCs can have deterministic behavior and there i could probably cheat a lot in code, allowing for total freedom of movement only for the player. Otherwise I would probably have considered George Foot's 1d-approach, only having hardware for the X-position. My thought in that regard was to only store an 8-bit sprite data and an 8-bit mask in the two ports of a 65C22 and be done with it.

White Flame · Post by **White Flame** » Fri Aug 22, 2025 4:42 am

Johan, you're using 2 bits for 3 "colors", you could make it 4 "colors" with zero additional cost by using XOR instead of OR to compose your image after the AND mask:

Code: Select all

AND  XOR  Effect
0    0    Off
0    1    On
1    0    Transparent
1    1    Inverted

With the normal inclusive OR, the fourth combination is just a redundant "On".

JohanFr · Post by **JohanFr** » Fri Aug 22, 2025 6:09 am

catelyn wrote:

gfoot wrote:

catelyn wrote:

Yeah, I do need to learn programmable logic at some point, I'm thinking of picking up a handful of PALs or something in my next order to mess around with them and figure out how they work. A lot of the circuits I'd need for this sprite hardware also fall into the category of "technically doable with discrete logic, but a major pain in the ass". I saw there was a category for programmable logic on here so I'll check that out some time before then!

There's lots of good information here in the programmable logic section, yes. If you just want to get a taste for it, you could stick with software initially, install WinCupl or something similar and experiment with the way you define the logic - using the supplied software simulator to check your logic is doing what you wanted it to. This will introduce you to the syntax you program the chips in, the kinds of capabilities various devices have (input pins, I/O pins, buried logic etc) and help you figure out whether to go ahead and get the hardware, and if so, which hardware you want.

I've tried to install WinCupl previously, but I run Linux rather than Windows and getting something that old working properly under wine is a slight pain. It's doable for sure but I haven't really had the necessity to do so yet.

Check out galette (google galette + PLD) if you are on Linux. Syntax is different from WinCUPL but it worked like a charm for me. Might not have the same support regarding simulation though.

JohanFr · Post by **JohanFr** » Fri Aug 22, 2025 6:15 am

White Flame wrote:

Johan, you're using 2 bits for 3 "colors", you could make it 4 "colors" with zero additional cost by using XOR instead of OR to compose your image after the AND mask:

Code: Select all

AND  XOR  Effect
0    0    Off
0    1    On
1    0    Transparent
1    1    Inverted

With the normal inclusive OR, the fourth combination is just a redundant "On".

Absolutely, but it would not render the sprite the way i want it: AND-mask 0 meaning whatever the sprite bit value is takes priority, since it clears the background bit, i.e:

Code: Select all

        lda (tileptr),y
        and (spritemask),y
        ora (sprite),y
        sta (tileptr),y

Yuri · Post by **Yuri** » Fri Aug 22, 2025 7:20 am

JohanFr wrote:

My thought in that regard was to only store an 8-bit sprite data and an 8-bit mask in the two ports of a 65C22 and be done with it.

I have been trying to puzzle together the details of how the SNES PPU works, and was considering how it managed to shuttle all the pixel data for the various sprites; seems like magic to me how it manages to pull in the color data of all the potential sprites, along with the background tile colors, and then work out which one is not only on top (you can reorder them based on an priority attribute), but which one was transparent, all on what seems to be a 16 bit data bus with a clock that doesn't run any faster than the pixel clock. All my ideas for how I might do such a thing in an FPGA would require me to boost the clock up to some crazy speed, even for 320x240 graphics.

However, while doing this, I recalled back in the day of how the mouse cursor was drawn in software. There were actually two sets of pixel data for the mouse, one was used for transparency and another was used to set/unset the pixel (black, white, or inverted!) As I recall this all required just a handful of very basic binary operators. I don't recall the specifics, I know XOR was one of them, or the order any longer, but I do recall it being pretty easy to do.

This further got me thinking about it (along with gfoot's note about the 1D sprites, and figuring alphas) that perhaps it might be easier to have the transparency broken out into a separate 1bpp map. In this way 8 pixels of transparency could be loaded in a single byte read from RAM; some clever application of multiplexers could then probably used to select the correct color (memory location, whatever) for the sprite.

Anyhow, pay no heed to the ramblings of the mad man in the corner quietly laughing to himself about bit depths and and data transfer rates. @_@

JohanFr · Post by **JohanFr** » Fri Aug 22, 2025 8:00 am

For a classic "parallel" sprite engine, I think it is just a matter of scaling up the hardware for as many sprites you support on a scanline at the same time. My guess is the PPU used the raster time available to read from memory the sprite metadata (bitmappointer, X, Y, flip, palette etc) , storing it in internal registers (one set per sprite). So it was "just" a matter of looping through the memory locations of sprites (which there could be a lot more of than was actually possible to render on screen) and filling the next set of registers as soon as a valid sprite was detected. Then on the actual scanline, all the sprite engines would work in parallel, feeding their outputs into a multiplexer that takes the current background data into account, then finally squeezes out a 2-bit + paletteselection value. Each sprite engine could, for example, feed it's two-bit pixel combination into an OR-gate, the output=1 meaning the sprite has data. An 8-to-3 priority encoder could then take the 8 bits (or technically 7 + background) to choose which one is to be rendered. Then THAT output could feed the select lines bunch of 74153s to get the actual palette data to be rendered for that pixel.

This is all pure speculation of course, but I think on an ASIC/FPGA this would be the most feasible thing to do. Now with discrete components, the chip count quickly becomes enormous, so the OPs approach might work better here, rendering sprite data into a frame buffer.

Edit: I realize the '153 has 2-bit select lines so something different would need to be used to support 3-bit selects. So for a hobbyist, perhaps 3 sprites + background would reduce chip complexity a bit.

Edit2: Sorry I got a bit carried away, this topic fascinates me

drogon · Post by **drogon** » Fri Aug 22, 2025 12:22 pm

JohanFr wrote:

Fwiw, there is always software sprites as a backup.

When I wrote my "big" BASIC, (runs under Linux on anything from a Pi v1 upwards) I did sprites in software. It was a challenge, but I was able to lever the power of underlying DMA engines to do 2D block moves of the data.

The display hardware has a concept of transparent pixels, so in the sprite you make all pixels that are not part of the sprite the colour of the transparent pixel and when the underlying library plots the sprite the magic happens at that level.

Plotting the sprites is the tricky part.

I keep a list of sprites and to plot a sprite, I 2D copy the underlying screen rectangle into a save area, then plot the sprite into that area.
At screen update time I restore the screen in the same order.

This generally works well - there is the drawback of speed vs. number of sprites but when testing, I was able to animate 64 64x64 pixel (32bpp) sprites on a Raspberry Pi v1. Tricky things are like scrolling the screen (also done in software). It's quite odd to scroll and have the background picture move up, but the sprites stay in the same place...

And each sprite is actually (potentially) a list of spites, so you can do sub-sprite animations - like space invaders which have 2 steps, and so on.

Harder things - detecting a sprite collision - needed when e.g. firing a missile at a space invader - you can use a few different algorithms - sprite centres being within a certain distance - works relatively well, or "perfect pixel" - works very well, but also very slow. Or do it in your own code.

In these enlightened days you just get the GPU to do it all for you..

This is a quick demo: https://www.youtube.com/watch?v=K2lZA24zPDk

-Gordon

White Flame · Post by **White Flame** » Mon Aug 25, 2025 1:57 am

JohanFr wrote:

Absolutely, but it would not render the sprite the way i want it: AND-mask 0 meaning whatever the sprite bit value is takes priority, since it clears the background bit, i.e:

Code: Select all

        lda (tileptr),y
        and (spritemask),y
        ora (sprite),y
        sta (tileptr),y

Yes, it still would render exactly the way you want it, changing ORA to EOR. The only case that changes is when the mask bit is 1, a sprite pixel 1 will invert the background bit, giving another "color" option on top of the 3 existing ones.

Mask 0 works exactly the same as your code (since ORA and EOR against 0 both yield the same result), as well as mask 1, sprite 0 still being normal background transparency.

White Flame · Post by **White Flame** » Mon Aug 25, 2025 2:17 am

Yuri wrote:

I have been trying to puzzle together the details of how the SNES PPU works, and was considering how it managed to shuttle all the pixel data for the various sprites; seems like magic to me how it manages to pull in the color data of all the potential sprites, along with the background tile colors, and then work out which one is not only on top (you can reorder them based on an priority attribute), but which one was transparent, all on what seems to be a 16 bit data bus with a clock that doesn't run any faster than the pixel clock. All my ideas for how I might do such a thing in an FPGA would require me to boost the clock up to some crazy speed, even for 320x240 graphics.

I'm pretty sure the s-ppu has a line buffer for sprites which renders in painter's algo, and uses realtime shift registers for the background layers. There is both a max count of sprites per line, and a hard limit of 256 sprite pixels handled per line (transparent or not). As a 4bpp sprite system on a 16-bit data bus, that means it can pull 4 pixels per cycle from the bus, or 64 cycles max, which it can squeeze in during hblank. The visible horizontal time is spent pulling background layer tile data, which is fetched a few cycles ahead of its actual display.

Palette storage is internal to the chip, so color lookup doesn't need to take video ram bus cycles, rather they'd be part of the fixed pipeline combining all the layers & windows into the final output pixel. Sprite priority would also happen there, since that is about sprite vs tile layer priority; sprite to sprite priority is fixed in almost all sprite hardware, and is handled implicitly with the painter's algo in drawing the line buffer. I presume the sprite line buffer retains some metadata bits per pixel for priority stuff like that, which again would be overwritten in painter's algo style to only hold the topmost pixel info.

And just found a doc which breaks a lot of the timing down, in the "DETAILED RENDERING TIMING" section near the bottom: https://github.com/naffnuff/SnesEmulato ... ming%20Doc
The NES has this wonderful timing diagram, wish the SNES had the same.

A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware

Re: A concept for sprite hardware