6502.org • View topic - A (hopefully) interesting hardware algorithm problem

View unanswered posts | View active topics

Board index » 6502.org Users Forum » General Discussions

All times are UTC

A (hopefully) interesting hardware algorithm problem

Page 2 of 2

[ 26 posts ]

Go to page Previous 1, 2

Previous topic | Next topic

Author

Message

AndrewP

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Tue Oct 18, 2022 6:01 am

Joined: Mon Aug 30, 2021 11:52 am
Posts: 261
Location: South Africa

gfoot wrote:

One other thing - you might want to bias it by half of the divisor, e.g. by initialising the accumulator (hard)

Good point, thanks! I think I should be able to do this by having the 65816 pre-calculate the value and then use a multiplexer to store it into a 573 (rather than the 273) on reset.

I might have two bugs that have cancelled out - the one being the inverter.

Attachment:

Sprite Stretching 9.png [ 43.36 KiB | Viewed 586 times ]

Here the second adder (Fred) is mostly adding large numbers together. In this example that it's adding F8 (the negated divisor) with FC (from the first adder). That gives a result of 1F4 or F4 carry one. This has not passed the critical limit (d >= lineWidth) so I should not step the source counter (when stretching smaller to larger). I've inverted the carry to inhibit the '161. And just to be sure, both the '283 and '161 use positive logic on the carry out and inhibit in.

teamtempest wrote:

the Run-Slice version

I've now read into Run-Slice and it has a corner cases on the start and end of the line that would be ... interesting ... to implement with 74' ICs. And then it still needs additions and comparisons to choose which slice (x or x+1) to run next. Sadly that means my hope of implementing it using just counters ('161s or '193s etc...) wouldn't work.

Top

barrym95838

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Tue Oct 18, 2022 6:16 am

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA

gfoot wrote:

... you want AAAABBBB, but without a bias, using increment 2-1=1 and divisor 8-1=7, you'd get AAAAAAAB.

Applesoft's HPLOT didn't bias, and the results weren't always pretty:

Attachment:

bresenham.PNG [ 24.07 KiB | Viewed 586 times ]

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)

Top

AndrewP

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Tue Oct 18, 2022 7:13 am

Joined: Mon Aug 30, 2021 11:52 am
Posts: 261
Location: South Africa

barrym95838 wrote:

Applesoft's HPLOT didn't bias, and the results weren't always pretty

Ouch. I vote that the technical term for an unbiased Bresenham line should be 'plink'. 'cause of the little drop at the end.

Top

gfoot

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Tue Oct 18, 2022 8:00 am

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741

Hmm, perhaps if you get the carry polarity backwards it will stay in the slightly negative range instead of being slightly positive. I'm not sure what other artifacts that will have but it might mostly work like that.

I guess for one thing you'd need a negative bias, which is nice because you already have a negated divisor handy so you can just use that without its bottom bit.

Regarding run-slice, it's interesting and beneficial because ideally you only want to step linearly through the destination pixels, even when squashing instead of stretching. Run-slice tells you that you can do this by counting up a fixed number of destination pixels and then just using Bresenham to work it whether to add an extra one or not, in each case.

Top

AndrewP

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Thu Oct 20, 2022 12:35 pm

Joined: Mon Aug 30, 2021 11:52 am
Posts: 261
Location: South Africa

Back before I'd posted any pictures a few days ago I managed to reverse the polarity by reversing ... something. For the life of me I cannot remember what I did nor can I reproduce it. So stuck in the negative I am.

And whilst this is supposed to be a theoretical exercise (for me) I've realised that this is actually quite doable in my little graphics blitter thing. I can almost just slot it in as my source address tick signal.

Attachment:

Sprite Stretching A.png [ 112.17 KiB | Viewed 537 times ]

I'm aiming for a maximum destination width of 1024. A bit more than a third additional chips* but worth it in terms of simplifying programming. The blitter runs at 10Mhz so that gives me plenty of time to perform 6 additions with a bit extra. And as the source and destination are both on the same bus I have to wait two ticks to do the read and the write; practically giving me 200ns to stabilise the source address.

I had done the initial bias by just wiring the top 11 bits into the bottom 11 bits of the '16244 and it worked but had an off-by-one error so I've changed back to making the bias something the 65816 and calculate and supply.

Otherwise this is absolutely amazing, the source address changes are spot on, even for extreme stretches like 2 pixels to 256 pixels (as in the above example).

Run-slice seems to me to be far more suited to tight software loops rather than a hardware implementation. For now I'm more than happy with George's solutions.

*Over a destination width of 256 (although 128 seemed to be the real maximum that worked using 8bits)

Top

gfoot

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Thu Oct 20, 2022 1:58 pm

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741

Quick note - you could probably parallelize the two adders to cut a significant amount of propagation delay off. Just make the second one add the sum of the two inputs, to whatever the input was to the first one.

The negative thing probably means you're using the final carry output backwards. Swapping the order of inputs on the multiplexers would resolve that, and maybe the sense of the count-enable on the counter I'd expect.

Top

gfoot

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Thu Oct 20, 2022 2:42 pm

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741

AndrewP wrote:

I had done the initial bias by just wiring the top 11 bits into the bottom 11 bits of the '16244 and it worked but had an off-by-one error so I've changed back to making the bias something the 65816 and calculate and supply.

If you did the bias a different way then you wouldn't need the 16244s at all I think. Is it possible for the CPU to add the bias to the source width at the start, just for the first cycle, then rewrite it with the correct value in time for the second cycle? Or if the CPU is asynchronous for the rest of the operation, perhaps you can chain together two registers there, to allow the CPU to preprogram both registers, but have the blitter copy the value from the first to the second each cycle, so that after the first cycle it then uses the new value for the rest of the operation. It would still be less ICs than having two sets of line drivers I think.

Quote:

Otherwise this is absolutely amazing, the source address changes are spot on, even for extreme stretches like 2 pixels to 256 pixels (as in the above example).

Run-slice seems to me to be far more suited to tight software loops rather than a hardware implementation. For now I'm more than happy with George's solutions.

The thing that run-slice would help with is the opposite operation - for squashing, e.g. 1023 pixels down to 2 pixels, which would currently take you 1023 cycles, redundantly rewriting the 2 output pixels many times in the process. I think that one way or another you need to be able to add to your source address, not just increment it, for that to work efficiently, and run-slice would involve adding either 511 or 512 - depending on the Bresenham result - rather than adding either 0 or 1 as usual.

Regardless of performance, the result is not going to look good for such extreme squashing - only two of the source pixels get displayed, when ideally you'd want some sort of average of all of them to be displayed. The main solution for this in graphics hardware is mip-mapping, which is just precalculating the result of squashing the data by increasing powers of two, calling these increasingly smaller images "mip levels", then when rendering, using the most appropriate mip level as the source image instead of the original. This can be done entirely in software for your case, and gives a huge performance improvement even on modern graphics hardware as well as improving render quality.

Top

AndrewP

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Fri Oct 21, 2022 2:38 pm

Joined: Mon Aug 30, 2021 11:52 am
Posts: 261
Location: South Africa

Thanks! I had the multiplexer wired in backwards. It now counts up the source addrerss only when the carry is set and mostly hovers around positive numbers. I still have that off-by-one error that requires that the initial bias is decremented but I'm going to chock that up to Logisim's instantaneous event propagation until such time as I try this with real ICs.

The '16244 would be replaced by a '16373 in the real-world as the '816 would latch the half bias into it (or if there is no off-by-one error then it goes away completely). I swapped the '157s out for '257s which already have an output enable so the other '16244 also goes away. The negated destination width still needs to be calculated by the '816 and latched by the graphics device. At a bit of a high level the idea is that as much setup is done only once (say the width of the sprite etc...) and then the blitter is called repeatedly with minimal changes. i.e. for an 8x8 sprite the only the destination memory address needs to be changed 8 times. The blitter begins blitting on a write to the destination address so that allows me to draw small characters fairly fast. If the sprite is clipped by the screen edge then there are bunch more calculations that have to happen and then be written to the graphics device. Clipping becomes a bit more expensive than not clipping but that's fine.

Attachment:

Sprite Stretching C.png [ 77.31 KiB | Viewed 492 times ]

However I digress.

I'm afraid I cannot work out how to parallelise the adders. Doesn't the first addition have to fully happen before the second can start using it? It would be great if that's possible as looking at the full blitter cycle I have a lot of time used already and a lot less nanoseconds to play with than I thought.

I've been so focused on stretching smaller to larger that I hadn't thought about shrinking. I glad you pointed out that run-slice is way more optimal there, thanks. Mip-maps would help (maybe to the point that I don't need run-slice) but there's also a palette involved so it's a tad trickier but quite solvable. Why would I not want run-slice? I have an (unfounded) suspicion I could use the bresenham line ICs backwards with minimal effort to shrink a sprite and I really would like to cut down the number of ICs I'm using. Gah, trade-offs.

Top

gfoot

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Fri Oct 21, 2022 5:05 pm

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741

AndrewP wrote:

I'm afraid I cannot work out how to parallelise the adders. Doesn't the first addition have to fully happen before the second can start using it? It would be great if that's possible as looking at the full blitter cycle I have a lot of time used already and a lot less nanoseconds to play with than I thought.

The first set of adders is computing A+B where A is a constant and B is the accumulated value. The second set is then computing (A+B)+C, where C is another constant. So the second set is computing (A+C)+B - and A+C is also constant and could be provided by the CPU instead of DEST_WIDTH (or calculated by the circuit with another set of adders).

Quote:

I've been so focused on stretching smaller to larger that I hadn't thought about shrinking. I glad you pointed out that run-slice is way more optimal there, thanks. Mip-maps would help (maybe to the point that I don't need run-slice) but there's also a palette involved so it's a tad trickier but quite solvable. Why would I not want run-slice? I have an (unfounded) suspicion I could use the bresenham line ICs backwards with minimal effort to shrink a sprite and I really would like to cut down the number of ICs I'm using. Gah, trade-offs.

I thought about this before, especially that maybe we can just swap the inputs to the calculator and swap the meanings of the outputs. The problem is that it will still iterate over all of the source pixels, even if it doesn't need to use them (because they get overwritten at the destination), and that's fundamental if the source address is tracked by a counter. If you want to skip source pixels then you need to be able to add other values to it instead of just 1. So there will be a hardware change needed to the way the source address is stored/updated.

Which values do we add? Well in normal operation for a stretch, you are adding 1 to the destination address every cycle, and either 0 or 1 to the source address, depending on the error term overflowing. For a squash, you can do the same thing, incrementing the destination every cycle, but add N or N+1 to the source address, instead of 0 or 1.

I think N is (SOURCE_WIDTH-1)/(DEST_WIDTH-1) rounded down, and you then need to supply R = (SOURCE_WIDTH-1) % (DEST_WIDTH-1) + 1 as input to the Bresenham part of the circuit instead of the full source width. I'm not sure about the "-1" bits, it feels weird but seems necessary, a bit like how you have to subtract one from the source width for the stretching calculation as well.

So given 17 source pixels to squash into 4 destination pixels, N would be 5 and R would be 2. So it's like stretching 2 source pixels over the 4 destination pixels, except after every pixel written we also increase the source address by N=5.

Here's a comparison between stretching 2 pixels over 4, and squashing 17 pixels into 4, to illustrate that - note that the squashing source addresses are just incrementing by an extra 5 each step:

Code:

Stretching 2 pixels over 4:  s0 => d0, s0 => d1, s1 => d2, s1 => d3.
Squashing 17 pixels into 4:  s0 => d0, s5 => d1, s11 => d2, s16 => d3.

Top

AndrewP

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Sun Oct 23, 2022 10:57 am

Joined: Mon Aug 30, 2021 11:52 am
Posts: 261
Location: South Africa

Thanks again!

Now that I've made the decision to use stretching in my graphics device I've discovered that I actually have to, well, use it in my graphics device. The problem is, and it's the major reason I've stopped using Logisim, that my simulation is very brittle so I'm going to be off on that tangent for a while as I fix everything to have proper propagation delays.

Eventually I'll come back to this but I need a solidly working simulator first*.

* Somewhere in all of this actually managed to finish FAT32 file reading and writing so at least I got that out the way.

Top

Sheep64

Post subject: Re: A (hopefully) interesting hardware algorithm problem

Posted: Mon Nov 21, 2022 12:19 pm

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field

Quote:

The goal is to design a blitter that can copy a single line of a sprite (or texture or whatever it needs to be called) from system memory into video memory. The destination width can be the same or greater than the source width. Using only 74 series ICs :shock:

When you've solved this problem, you can apply a similar technique to play PCM audio at any frequency. In the most optimistic case, you might be able to share circuitry.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!

Top

Page 2 of 2

[ 26 posts ]

Go to page Previous 1, 2

Board index » 6502.org Users Forum » General Discussions

All times are UTC

Who is online

Users browsing this forum: No registered users and 6 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum