6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 12:23 pm

All times are UTC




Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: Understanding a blitter
PostPosted: Fri Mar 06, 2009 7:24 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
In another thread, Garth asked what made GPUs and the like so special. Well, I'm about to describe a basic, 2-D blitter, the most primitive kind of GPU you can imagine. This is 1985 technology here, so what you'll find in modern video cards is perhaps 1000x or more as powerful as what I'm about to describe.

The Problem

Manipulating graphics is inherently a matrix manipulation operation. You're more or less repeating the same algorithm across multiple pixels in a rectangular grid. Boring stuff.

The Solution

Let's assume a 3-source blitter, as found in the Commodore-Amiga hardware. Because you have 3 sources, you can combine these sources in up to 256 different ways. This is because you can generate eight different "minterms":
Code:
(0)  Y=/A*/B*/C
(1)  Y=/A*/B*C
(2)  Y=/A*B*/C
(3)  Y=/A*B*C
(4)  Y=A*/B*/C
(5)  Y=A*/B*C
(6)  Y=A*B*/C
(7)  Y=A*B*C

Obviously, we're talking about boolean math here -- * implies logical AND, and / implies logical negation. Each minterm corresponds to a single bit in a control register. Setting multiple bits yields an AND/OR relationship. In the above example, using a control value of $0A will produce the logical result Y=/A*B*C+/A*/B*C. Reducing the equation yields the effective result Y=/A*C.

For bitmaps where the number of bits per pixel is less than the smallest addressible memory location (a word in the case of the Amiga's blitter), there exists the chance that the left edge of a bitmap will appear on a non-word boundary. For this reason, both the A and B channels come equipped with a barrel shifter, each performing a multi-word logical shift in either the left or right directions. The data flow is as follows:
Code:
(from DMA)
        |
        V
+-----------------+-----------------+
|  Data from DMA  |  Previous Word  |
+-----------------+-----------------+
------> |                   |
(shift  +-------------------+
amount)           |
                  |
                  V
          final data word

Note that the C channel, the so-called "Pattern" channel, is always assumed to be word-aligned (as all hatches, stipples, and halftone patterns tend to be). For this reason, no silicon is invested for barrel-shifting on the C channel.

Each of the A, B, and C channels are driven from either DMA sources or are spoon-fed from the CPU, depending on whether the DMA channel is enabled for that source. For simplicity, I'm going to assume all have DMA enabled.

The D channel (which I'm going to relabel to the Y channel, to remain consistent with logic equations commonly found in TTL databooks) is the "destination" channel, which also has its own DMA channel. If the DMA channel is disabled, then the results of the blit are ignored, leaving only flags set. I will get into this in a bit below.

Our blitter, then, as implemented above, implements the following pseudocode. Note: I'm not even going to try to code this in Forth:
Code:
unsigned int *pointerA, *pointerB, *pointerC;
unsigned int shiftA, shiftB;

for(y = 0; y < HEIGHT; y++) {
  channelA = 0;
  channelB = 0;

  for(x = 0; x < WIDTH; x++) {
    previousA = channelA;
    previousB = channelB;

    if(dmaEnabledForA)  channelA = *pointerA++;
    if(dmaEnabledForB)  channelB = *pointerB++;
    if(dmaEnabledForC)  channelC = *pointerC++;

    wordA = (previousA >> 16-shiftA) | (channelA << shiftA);
    wordB = (previousB >> 16-shiftB) | (channelB << shiftB);

    notA = ~wordA;
    notB = ~wordB;
    notC = ~channelC;

    y = 0;
    if(ctrl & 0x01)  y = y | (notA & notB & notC);
    if(ctrl & 0x02)  y = y | (notA & notB & channelC);
    if(ctrl & 0x04)  y = y | (notA & wordB & notC);
    if(ctrl & 0x08)  y = y | (notA & wordB & channelC);
    if(ctrl & 0x10)  y = y | (wordA & notB & notC);
    if(ctrl & 0x20)  y = y | (wordA & notB & channelC);
    if(ctrl & 0x40)  y = y | (wordA & wordB & notC);
    if(ctrl & 0x80)  y = y | (wordA & wordB & channelC);

    if(dmaEnabledForY)  *pointerY++ = y;
  }

  pointerA = pointerA + moduloA;
  pointerB = pointerB + moduloB;
  pointerC = pointerC + moduloC;
}

This is a gross simplification over the simplest possible use-case for the Amiga's blitter.

The Amiga's blitter also can perform area fills and implements one of the first hardware Bresenham's line-drawing algorithm entirely in hardware too.

For collision detection between several software sprites, the blitter also supports a "zero flag". If you wanted to perform a collision detection without also producing an update to some bitmap, then you'd configure the blitter as normal, except that you'd disable the Y DMA channel. Upon completion of the phantom blit, the zero flag will be set or cleared, depending on whether or not a collision between two soft-sprites occurred.

As you can see, having hardware acceleration for implementing this kind of loop, plus for line drawing and area fills, can result in a substantial amount of work performed for only the cost of filling a set of I/O registers. The blitter is pipelined, which enables it to retire one bitmap word per every fourth clock cycle (remember that four clock cycles are needed: three for the A, B, and C fetches, and one for the Y write-back, if enabled). If the chipset drew data from a cache, it'd be substantially faster, perhaps retiring a single bitmap word every clock cycle.

The block diagram of the business logic of the blitter is:
Code:
     A                     B                     C

     |                     |                     |
     |                     |                     |
+--------+--------+   +--------+--------+        |
| DATA A | PREV A |   | DATA B | PREV B |        |
+--------+--------+   +--------+--------+        |
     | shift |             | shift |             |
     +-------+             +-------+             |
         |                     |                 |
         |                     |                 |
     +--------+            +--------+        +--------+
     | WORD A |            | WORD B |        | WORD C |
     | Q   /Q |            | Q   /Q |        | Q   /Q |
     +--------+            +--------+        +--------+
       |   |                 |   |             |   |
       |   |                 |   |             |   |
     +------------------------------------------------+
     |  MINTERM GENERATOR (AND/INVERT LOGIC)          |
     +------------------------------------------------+
        |     |     |     |     |     |     |     |
        | +-- | --- | --- | --- | --- | --- | --- | -----o  Y=ABC
        | |   | +-- | --- | --- | --- | --- | --- | -----o  Y=AB/C
        | |   | |   | +-- | --- | --- | --- | --- | -----o  Y=A/BC
        | |   | |   | |   | +-- | --- | --- | --- | -----o  Y=A/B/C
        | |   | |   | |   | |   | +-- | --- | --- | -----o  Y=/ABC
        | |   | |   | |   | |   | |   | +-- | --- | -----o  Y=/AB/C
        | |   | |   | |   | |   | |   | |   | +-- | -----o  Y=/A/BC
        | |   | |   | |   | |   | |   | |   | |   | +----o  Y=/A/B/C
        | |   | |   | |   | |   | |   | |   | |   | |
     +------------------------------------------------+
     |  MINTERM GATES                                 |
     |  (16-input, 16-bit AND gates)                  |
     |                                                |
     |                                                |
     +------------------------------------------------+
        |     |     |     |     |     |     |     |
        |     |     |     |     |     |     |     |
     +------------------------------------------------+
     |  8-input x 16-bit OR gates                     |
     +------------------------------------------------+
                             |
                             |
                    +-----------------+
                    | Zero-flag and   |
                    | area fill logic |
                    +-----------------+
                             |
                             V

                             Y

As you can see, you are going to have quite a difficult time competing with this circuit in even the most optimized 6502 or 65816 assembly language.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 06, 2009 8:15 pm 
Offline
User avatar

Joined: Sun Feb 13, 2005 9:58 am
Posts: 85
very nice and illuminating article, thanx


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 26, 2018 9:16 am 
Offline

Joined: Fri Jan 26, 2018 9:09 am
Posts: 1
kc5tja wrote:

Our blitter, then, as implemented above, implements the following pseudocode. Note: I'm not even going to try to code this in Forth:
Code:
unsigned int *pointerA, *pointerB, *pointerC;
unsigned int shiftA, shiftB;

for(y = 0; y < HEIGHT; y++) {
  channelA = 0;
  channelB = 0;

  for(x = 0; x < WIDTH; x++) {
    previousA = channelA;
    previousB = channelB;

    if(dmaEnabledForA)  channelA = *pointerA++;
    if(dmaEnabledForB)  channelB = *pointerB++;
    if(dmaEnabledForC)  channelC = *pointerC++;

    wordA = (previousA >> 16-shiftA) | (channelA << shiftA);
    wordB = (previousB >> 16-shiftB) | (channelB << shiftB);

    notA = ~wordA;
    notB = ~wordB;
    notC = ~channelC;

    y = 0;
    if(ctrl & 0x01)  y = y | (notA & notB & notC);
    if(ctrl & 0x02)  y = y | (notA & notB & channelC);
    if(ctrl & 0x04)  y = y | (notA & wordB & notC);
    if(ctrl & 0x08)  y = y | (notA & wordB & channelC);
    if(ctrl & 0x10)  y = y | (wordA & notB & notC);
    if(ctrl & 0x20)  y = y | (wordA & notB & channelC);
    if(ctrl & 0x40)  y = y | (wordA & wordB & notC);
    if(ctrl & 0x80)  y = y | (wordA & wordB & channelC);

    if(dmaEnabledForY)  *pointerY++ = y;
  }

  pointerA = pointerA + moduloA;
  pointerB = pointerB + moduloB;
  pointerC = pointerC + moduloC;
}



Fantastic description, but I believe you have the midterms in the wrong order... I could be wrong, but operation 0x80 is done first, then 0x40 etc...

I'm currently trying to implement a software blitter, and I couldn't get the shifting work properly... with your pseudo code I think I can see my error :)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: