6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon May 13, 2024 3:27 pm

All times are UTC




Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sun Oct 16, 2022 8:00 am 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 268
Location: South Africa
I've been bogged down in software side of things for ages now (FAT32 and SD cards) so I've not had a lot to post and what I'm doing is mostly just boring grunt work.

I did need something a little more stimulating so I sketched up a design for hardware stretching of sprites. And it didn't work. So I put some more thought into it and. And it didn't work again. It's not something I seriously want to implement but I thought as the forum is quiet at the moment maybe it might pose an interesting challenge here.

The goal is to design a blitter that can copy a single line of a sprite (or texture or whatever it needs to be called) from system memory into video memory. The destination width can be the same or greater than the source width. Using only 74 series ICs :shock:

For example how do I take the 4 pixel wide sprite below and stretch it out to any of those destination widths?
Attachment:
Sprite Stretching 1.png
Sprite Stretching 1.png [ 51.7 KiB | Viewed 1053 times ]
The 'quotient' part seems easy enough. For stretching 4 pixels to [8..11] pixels the destination address must increment least twice before the source address must increment.

But how do I distribute the 'remainder' the remaining pixels using only counter, logic, maybe adder and latch ICs? They don't have to be real 74 series ICs. They can magically be as wide as needed. Or to put it another way: what would the algorithm for this be using only very simple operations? Initial conditions needing division / multiplication / whatever can be setup using an MPU.

Here is a simple case where the destination width is a multiple of the source width.
Attachment:
Sprite Stretching 2.png
Sprite Stretching 2.png [ 76.4 KiB | Viewed 1053 times ]
It is easy enough to step the destination address twice whilst only incrementing the source address once each time at the end of the 2th count destination.

But here the more complicated example where destination is not a multiple of the source.
Attachment:
Sprite Stretching 3.png
Sprite Stretching 3.png [ 58.99 KiB | Viewed 1053 times ]
How do I know that I must allow the destination address to step 3 times to distribute the remaining pixel at the end of the cyan run?

Or more likely
Attachment:
Sprite Stretching 4.png
Sprite Stretching 4.png [ 60.38 KiB | Viewed 1053 times ]
How do I know to step the destination address 3 times after the orangey colour run?

Again not a serious question but it turned out to me a lot harder to solve using only counters and logic than I thought it would be. Hopefully someone gets some entertainment our of this :D


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 9:44 am 
Online

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 298
If the target was sourceWidth*destWidth pixels wide, you'd want to have destWidth copies of each source pixel. So imagine a line with sourceWidth blocks of pixels, each destWidth pixels wide.

But you don't actually want that many. You only want destWidth of them. If you step through that imaginary line sourceWidth pixels at a time, you'll do that destWidth times in total.

If you keep track of your position in the imaginary line (adding sourceWidth to it for each pixel that you write), then working out which source pixel to read from requires a division. You obviously don't want that. But you don't need the position as a single number. All you care about is which block you're in, and how far into that block you are.

In the following pseudo-code, the block is stored in sourceCounter, and the position within that block in d. Each time you write a pixel, d increases by sourceWidth. If it becomes greater than destWidth, it has moved into the next block. So increase sourceCounter and subtract destWidth from d.

It's the same maths as Bresenham's line drawing algorithm. Instead of drawing a line on the screen, it's following the diagonal of a rectangle of sourceWidth * destWidth pixels. One coordinate tells you where to draw the pixel on screen, and the other tells you where to fetch its colour from.

Code:
sourceCounter <= 0, destCounter <= 0, d <= 0
loop:
    copy source to dest
    d <= d + sourceWidth
    if d > destWidth
        d <= d - destWidth
        sourceCounter <= sourceCounter + 1
    destCounter <= destCounter + 1


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 9:59 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
We crossed posts, but maybe this also helps explain the algorithm:

As a first port of call, I'd probably use shift registers (one per bit channel) clocked less often than once per pixel (destination address increment), using Bresenham's algorithm to determine when you clock them. The same thing would kind of also work the other way around if you interment your destination address more times than your shift registers.

The sprite width being a power of two will make Bresenham very easy to do in hardware I think.

So for example, to stretch 8 sprite pixels ABCDEFGH over 5 pixels abcde, you would write A to a, then step to B and compute remainder term 0+5=5. This is less than 8 so you don't increment the destination.

Now your write B to a, compute accumulated remainder 10, which is more than 8 so reduce mod 8 (2) and increment the destination.

Write C to b, get remainder 2+5=7, not greater than 8

Write D to b, get remainder 7+5=12, greater, increment destination

Write E to c, remainder 4+5=9, increment

Write F to d, remainder 1+5=6

Write G to d, remainder 6+5=11, increment

Write H to e, remainder 3+5=8, increment

The end result is BDEGH.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 4:10 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA
gfoot wrote:
The sprite width being a power of two will make Bresenham very easy to do in hardware I think.

Bresenham was exactly the first thing that popped into my head, but I lack the hardware and/or software chops to actually implement it.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 6:58 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
barrym95838 wrote:
gfoot wrote:
The sprite width being a power of two will make Bresenham very easy to do in hardware I think.

Bresenham was exactly the first thing that popped into my head, but I lack the hardware and/or software chops to actually implement it.

For this aspect I would start by thinking of what state it needs to store and how that state is updated on each step. What state do we have? The residue needs storing somewhere (e.g. in an 8-bit D flipflop register). We also need source and destination address counters, or shift registers, or similar things.

Regarding the state updates, we need to be able to add a constant to the residue, so need maybe a 4-bit adder to do that. Being able to reset it is also useful at the start, so use a DFF with a reset pin. For 8-pixel wide sprite, the bottom three bits from the adder feed back th the register's inputs, so it does the modulo automatically. So those two together can count up in fives modulo 8 for my example, based on an input clock.

The next bit up on the adder is then effectively a carry bit, indicating whether the residue is about to overflow. If this is set then on the next clock we want to increment the destination pixel. So this could feed into a clock enable pin on a counter, perhaps. Then the counter and residue will work in tandem based on the input clock, and another counter can also be counting up the source pixel index on every tick.

This is simpler because the divisor is a power of two, meaning that the remainder "carries" when a certain number of bits overflows (3 bits in my case) and taking the modulus is just discarding that top bit.

For other divisors you need to do some sort of magnitude comparison, and then subtract the divisor if it overflowed. One observation there is that we might as well go ahead with the subtraction, then if the result of that is not negative, then we did overflow. So the magnitude comparison is built in. Now we just need to tick the destination address counter if that happened, and arrange for the residue register to load from the subtraction output rather than the adder output. So a multiplexer would do that nicely.

I might see about putting this together in logisim or a schematic if anyone is interested to see it all together (and test if it works!)

I just realised that the original question talked about stretching sprites but my example was the opposite. In this case the divisor is going to be the target width, not the source width, and so it indeed won't necessarily be a power of two.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 7:29 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 268
Location: South Africa
Thanks all!

I've bashed out a quick Bresenham implementation in Java and it works beautifully
Attachment:
Sprite Stretching 5.png
Sprite Stretching 5.png [ 52.64 KiB | Viewed 990 times ]


For the 11 character wide texture "AbCdEfGhIjK"

I get the following output for line widths from 11 to ... something (truncated text)
Code:
AbCdEfGhIjK
AAbCdEfGhIjK
AAbCdEffGhIjK
AAbCddEfGhhIjK
AAbCCdEffGhIIjK
AAbCCdEEfGGhIIjK
AAbbCddEffGhhIjjK
AAbbCddEEfGGhhIjjK
AAbbCCdEEffGGhIIjjK
AAbbCCddEEfGGhhIIjjK
AAbbCCddEEffGGhhIIjjK
AAbbCCddEEffGGhhIIjjKK
AAAbbCCddEEffGGhhIIjjKK
AAAbbCCddEEfffGGhhIIjjKK
AAAbbCCdddEEffGGhhhIIjjKK
AAAbbCCCddEEfffGGhhIIIjjKK
AAAbbCCCddEEEffGGGhhIIIjjKK
AAAbbbCCdddEEfffGGhhhIIjjjKK
AAAbbbCCdddEEEffGGGhhhIIjjjKK
AAAbbbCCCddEEEfffGGGhhIIIjjjKK
AAAbbbCCCdddEEEffGGGhhhIIIjjjKK
AAAbbbCCCdddEEEfffGGGhhhIIIjjjKK
AAAbbbCCCdddEEEfffGGGhhhIIIjjjKKK
AAAAbbbCCCdddEEEfffGGGhhhIIIjjjKKK
AAAAbbbCCCdddEEEffffGGGhhhIIIjjjKKK
AAAAbbbCCCddddEEEfffGGGhhhhIIIjjjKKK


But three of the operations in the Bresenham algorithm are difficult (or at least expensive) to implement in 74 ICs.
Attachment:
Sprite Stretching 6.png
Sprite Stretching 6.png [ 52.96 KiB | Viewed 990 times ]

Just for the fun of it I'll try and implement the above in Logisim tomorrow but in the Real World(TM) I'd have to use a '283 adder - and that doesn't have an internal register - so I'd probably need two '573s or something if I'm adding a number to itself; and the comparators are kinda slow as the only cheap one still available is the 74HCT85 (the 74F85 is obsolete and the 74LS688 is *really* expensive). Pre-computing the twos-complement of line width is easy enough but pausing the textureWidth adding cycle while the negative lineWidth is added is going to challenge me*.

I get the feeling it should be possible to tell with a (couple of?) up or down counters when d would be >= to line width as I think there should be a pattern there but this a bit stream-of-consciousness and not well though through at the moment. '191 presettable up / down counters are pretty cool. And they have an internal count register that simplifies things a bit.

*Maybe I could always compute the lineWidth subtraction in parallel but only use the result when d is >= lineWidth. I need to think about that.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 7:47 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
AndrewP wrote:
Pre-computing the twos-complement of line width is easy enough but pausing the textureWidth adding cycle while the negative lineWidth is added is going to challenge me*.

*Maybe I could always compute the lineWidth subtraction in parallel but only use the result when d is >= lineWidth. I need to think about that.


Yes, this. Often it's easier to stack up a deeper (or separate, parallel) chain of asynchronous components, and then use a multiplexer to pick which result you want to keep and latch synchronously into the register, rather than computing and storing an intermediate value, and then deciding whether or not to do some different extra operation afterwards.

In this case, as I said in my last post, feed the output of the first adder into a second one to subtract lineWidth, and choose which result to keep based on whether that second one came out positive.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 7:48 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 268
Location: South Africa
Sorry George I managed to post over you. I haven't had a chance to read your post properly yet as I'm off to bed. I'll give it some proper thought tomorrow, and I also still need to think about shift registers and power of two sprites. Cheers Andrew!


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 16, 2022 8:02 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
No problem. One other thought - assuming this is driven by a CPU, you can require the CPU to precalculate anything difficult for you up front, such as the negative width. And you could even require the CPU to supply the scale factor, e.g. in 1/256ths. So now you can just accumulate this in a register and watch out for overflow - effectively your divisor is always a nice power of two. It is less accurate than Bresenham, but that won't be noticeable in practice.

BTW I've been thinking a bit about making a TTL video circuit to do 3D rendering. That's kind of a long term goal but it requires essentially this calculation, several times at once


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 17, 2022 10:09 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Here's a logisim of the example I posted above. The thing in the top left is an 8-bit D flipflop register with reset, e.g. 74HC273. There are two 8-bit adders, an 8x2-to-1 multiplexer, and two counters for output but what you use there depends on what you want to do with the data next.

The inputs are the amount to add each cycle, and the negated divisor. For stretching you want them one way around, for squashing you want them the other way around.

I still think it's probably simpler to renormalise the divisor to be a fixed power of two, like 256 - then you only need one adder, and don't need the multiplexer.


Attachments:
stretcher_bres.png
stretcher_bres.png [ 20.43 KiB | Viewed 925 times ]
Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 17, 2022 2:07 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 388
Location: Minnesota
Brensenham's algorithm is the one that first came to mind for me as well, but I saw it was already mentioned. But now I think it's worth mentioning the Run-Slice version, which I first encountered in Michael Abrash's "Zen of Graphics Programming".

The basic idea is to take as many calculations as possible out of the main loop. If you look at the result of the Java implementation, you can see that each source "pixel" is repeated either "x" times or "x+1" times for any destination width. The value of "x" is something that can be pre-computed before entering the main loop. Once there, all you have to do for each source "pixel" is to decide whether to draw it "x" or "x+1" times to the destination.

The speedup can be quite dramatic.


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 17, 2022 2:39 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 268
Location: South Africa
Thanks for that! I vaguely understand that you're using the carry out to determine if (the source counter if stretching) should count up but I'd assume that in the end the source should have a value of 4 and the dest should have a value of 7.

I tried putting your circuit together in Logisim as well and get a 2 instead of a 4. There is really high chance I've misunderstood something (and even higher chance that I have a timing issue) but this is what I came up with
Attachment:
Sprite Stretching 7.png
Sprite Stretching 7.png [ 55.58 KiB | Viewed 906 times ]
and the address is way to small.

But I realised as I posted this what I didn't understand was Logisim's counters. I've used an up / down counter where what I should have done is use the carry out to inhibit counting. Did I just rubber duck with myself by posting this? :lol:


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 17, 2022 2:47 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 268
Location: South Africa
teamtempest wrote:
The basic idea is to take as many calculations as possible out of the main loop. If you look at the result of the Java implementation, you can see that each source "pixel" is repeated either "x" times or "x+1" times for any destination width. The value of "x" is something that can be pre-computed before entering the main loop. Once there, all you have to do for each source "pixel" is to decide whether to draw it "x" or "x+1" times to the destination.
Awesome! I had a feeling it should be possible to do this using just increments in the main loop but I couldn't work out how to chose (or distribute) the x vs x+1 slices. I'll look up Run-Slice once I've fixed my Logisim implementation above as it sounds like exactly what I want.

I'll be (theoretically) using a '816 to do all the pre-calculations and it has fairly large 'coprocessor' of lookup tables so divisions and multiplications are cheap if needed.


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 17, 2022 3:38 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 268
Location: South Africa
I've fixed up my Logisim simulation to only use up counters now (very clever thanks!).

And it works except for what looks like an off by one error on the source address (when stretching to a larger destination).so I've 'pre-cacluated' the source width down by one.


Attachments:
Sprite Stretching 8.png
Sprite Stretching 8.png [ 47.39 KiB | Viewed 897 times ]
Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 17, 2022 5:41 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Great, it sounds like you figured since of the issues out - but yes, the carry really needs to go into a "clock enable" input, and isn't necessarily clean enough to feed into an edge triggered clock input.

There are still some issues though I think. If the carry is zero then the subtract failed - just like in 6502 assembly - as we are just adding the complement rather than using a dedicated subtraction unit so carry works backwards. So if the carry is clear, you want to use the result from the first adder, not the result of the subtraction. I think that might still be wrong in your last diagram. And you should only count up when the carry is set, I think, so the inverter shouldn't be needed?

In general it will count from (0,0) to (5,8) given these inputs, so yes I think you probably want to subtract one from both widths before passing them in. So if stretching 8 source pixels over 54 target pixels, for example, the increment is 7 and the divisor is 53.

One other thing - you might want to bias it by half of the divisor, e.g. by initialising the accumulator (hard) or just using a third adder after the subtractor to add the offset. The output is only used to decide whether a carry occurred, it's still the first and second results that go to the multiplexer.

An example is stretching two source pixels over eight target pixels - you want AAAABBBB, but without a bias, using increment 2-1=1 and divisor 8-1=7, you'd get AAAAAAAB.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: