6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 10:14 pm

All times are UTC




Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Feb 23, 2024 8:13 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
Sorry, but this is going to be long. I'll break it into hopefully edible chunks.

This is part of my investigations into the various ways one might generate vga signals as simply as possible but without using programmable logic devices. This uses an eprom/eeprom/flash so it comes close... inspired by this video by Dr Matt Regan: https://www.youtube.com/watch?v=qO1dNRKHeb4

The approach he uses doesn't quite work for a number of reasons for VGA, not least of which is the increased memory width required, so I did some thinking.

State machines
A state machine can be easily built using a prom and a latch of its outputs, with the latched outputs fed back to the address inputs of the prom. On each clock pulse, the latch will read the current output and deliver it back to the input; the next clock pulse will latch the data at this new address.
An incrementing counter is probably the most simple: it requires only that each memory location contains the address of the next location.
An arbitrary sequence can be generated in this way, for example the count required for a video output can be implemented at any length just by letting the last address refer back to the start address of the line: 01→02→03→04→00 (that’s a very short line!).
This approach can be used for both the character position in the line and the line count through the video field. The syncs and blanking can be encoded using higher bits in the address to enable their extraction without external logic.
Ideally, to save circuit board real estate, one would want to use a single prom. Commonly available parts are 8- or 16-bits wide.

VGA timing
The simple vga video signal is 640 by 480 pixels, plus blanking and syncs. Separate vertical and horizontal syncs are required, negative going; the sense of the blanking signal is arbitrary but convenient if true during active picture.

A character based vga output using a character cell of eight pixels wide by sixteen deep provides a display of eighty characters by thirty. To output a character, the datum representing that character must be extracted from video ram and latched; the latched output is fed to a character prom along with four bits of data representing the line count within the character. The output from that character prom is latched and serialised during the eight-pixel character time.

A graphical vga output of the same resolution will still read a character from ram, but will require sixteen times as much ram as the character version since each datum is displayed only once without translation.

In both cases the line timing is the same: for standard 60Hz vga, the dotclock is 25.175MHz and the ‘character clock’ one eighth of that. Since all the line timings are integers based on the dot clock and divisible by eight, it is convenient to use character timings.

0 – 79 Active line (starts at 0x00)
80 – 81 Front porch (starts at 0x50)
82 – 94 Line sync (starts at 0x52)
95 – 99 Back porch (starts at 0x5E)

Let’s choose the three highest (of sixteen) bits for the control outputs: 0x8000 for line sync, 0x4000 for field sync, and 0x2000 for blanking. Let us also assume for now that the control signals are active high. We can sort them out later.


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 23, 2024 8:14 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
On this basis, our state machine contents look like this (in hex):
Address Data
0000 0001 Start of line
0001 0002 next character
… … incrementing count
004e 004f getting to the end of the visible line
004f 2050 set the (inverted) blanking
2050 2051 front porch
2051 a052 line sync – we’re still blanking
a052 a053
… …
a05e 2054 back porch
2054 2055
… …
2063 0000 and back to the start of the line

to generate a single repeating video line output with line syncs. But there are a couple of obvious issues; not least of which, we’re only showing the same line all the way down the screen. Let’s fix that:

2063 0064 and back to the start of the next line
… …
0050 0051 and so on…

This works, with each line in memory 0x50 bytes further in memory, but there are some issues: first, this is only going to work for a graphical interface, not a mapped character interface (though of course you can plot characters at will) because different memory is read for each line. Secondly, we need to output 525 lines. At one hundred character per line, that’s a total of 52,500 memory addresses – requiring sixteen bits to address and with the addition of the control signal, nineteen. So we’re looking at two or three proms, not to mention the vram requirement. Even reducing the references to only the 80 bytes per actually displayed line doesn’t help a lot; we’re still looking at 42,000 addresses.

To keep the address space manageable we need to restrict ourselves to character cells. In those terms, the visible space is now only 2400 bytes (80 * 30) and the total space including blanking is 100 * 34 = 3400 bytes (actually a little less, but we’ll see).

However, we need a way to repeat lines and yet somehow move between cells. Unfortunately this needs more hardware. Suppose we use a counter such as a ‘163 to count lines. We need to count on a low-going edge at the end of each line, so an inverter is required. Now the prom doesn’t generate an in-character line count; instead, we use the output from the ‘163. Each block of sixteen lines is now represented with two sets of data: one in which the last line entry points to the start of the current character line, and one which points to the next line.

0001 0002 … 004f 2050 2051 a052 a053 … a05d 205e … 2063 0000 ← points back to same line
0081 0082 … 00cf 20d0 20d1 a0d2 a0d3 … a0dd 20de … 20d3 0100 ← points to next line

We increase the bytes per line to 128, even though we only use a hundred of them, so that we can remove the bit 7 from the address feedback and replace it with the terminal count output from the ‘166. That occurs only on line 15, so the sequence repeats the first line of addresses fifteen times, then the second line just once. We ignore bit 7 in the vram address, so both of these refer the same video ram data. From there it goes to the second video line address starting at 0x0100.
When we have output thirty lines of characters, our last visible line started at address 0x1d00 and ended at 0x1dd0.


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 23, 2024 8:17 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
Field timing
But things get interesting at vertical blanking, which starts at line 480. We have ten lines of front porch, two lines of field sync, and the remainder – thirty-three lines – are also blanking. Each of these lines also requires horizontal blanking and syncs.

On top of that, we’re about to run out of prom: we can’t use any address higher than 0x2000 or we’ll interfere with existing blanking data. We can’t use a character address greater than 0x..40 or the ‘166 will be clocked, and we don’t want that because the field syncs and end of field don’t fall on tidy boundaries…

But: all the previous uses of the 0x2000 blanking area have all been with low addresses over 0x..4f or 0x..df, so we can use the rest of the space. And if we only use the ranges 0x2.00 to 2.3f then the ‘166 will never be clocked. The addresses output will not have any useful relationship to video ram, but as they are in blanking we don’t care.

Simplicity!
So now, for simplicity, the last character position takes us to 0x2000. From there, we use partial lines to avoid clocking the ‘166:

2001 2002 … 203e 203f 2080 ... a082 a083 … a091 2092 … 2100 ← blanked line with sync

repeated ten times, then two field syncs:

6a01 6a02 … 6a3e 6a3f 6a80 ... ea82 ea83 … ea91 6a2 … 6b00
6b01 6b02 … 6b3e 6b3f 6b80 ... eb82 eb83 … eb91 6b92 … 6c00

And after that, the remaining blanked lines as from 2000… simple!

The only remaining detail is that the syncs and blanking are, as shown here, inverted. We need them the other way up, which moves the whole data around in the prom (which is of course mostly empty). Since they’ll be programmatically generated, that’s not an issue, but if we keep them as they are, they’re perhaps a little easier to follow. We’ve had to use one inverter, so we will probably have another five units handy in an ‘04 part which we can use.

So the entire video addressing, blanking, and separate sync circuit can be fitted in just five chips: one 64k x 16 prom, one ‘163 counter, two ‘574 latches, and one ‘04 inverter. The complete video circuit will require other parts, of course: another ‘163 to divide the dot clock down and generate a latching signal, a character rom, another ‘574 to latch the data from the video ram, and a ‘166 to serialise it.

Next steps
  • code to generate the prom data
  • test it in Logisim
  • circuit design

Neil


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 23, 2024 10:20 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 745
Location: Germany
ironically, a PROM could technically be counted as "programmable logic" if you use it like a LUT (Lookup Table) to build a state machine :D

For testing VGA circuits i'd recommend Digital as it has a pretty accurate built-in VGA component and 74' logic chips as well so you could build and test the whole circuit in the simulator before making it reality.

on another note, what PROMs are you planning on using that are fast enough to run at 25MHz (40ns)? even ST's Flash chips only go down to 55ns.
and you do need it to update the state machine once per clock cycle to do the sync signals properly as even just running them at 12.5MHz would offset the Horizontal Sync signal by exactly 1 25MHz clock cycle.

but if you could somehow use those Flash chips, you can save space by using the PLCC versions, 2 of which take up around the same PCB space as a single DIP Flash/PROM. and using 3 of those you could (i think) completely avoid the need for the '163. unless i misread something.

with 3x SST39SF040's you have a LUT with 19 inputs and 24 output.

the outputs i would map as such (LQ just meaning "LUT Output"):

Code:
LQ23  ---- Horizontal Sync
LQ22  ---- Vertical Sync
LQ21  ---- Blanking (0 = active video, 1 = blanking)
LQ20  ---- Read from Character RAM
LQ19  ---- Read from Data RAM
LQ18  \
LQ17  |
LQ16  |--- Current Character (Vertical)
LQ15  |
LQ14  /
LQ13  \
LQ12  |
LQ11  |--- Vertical Slice (ie Line) of the current Character
LQ10  /
LQ9   \
LQ8   |
LQ7   |
LQ6   |--- Current Character (Horizontal)
LQ5   |
LQ4   |
LQ3   /
LQ2   \
LQ1   |--- Horizontal Slice (ie Pixel) of the current Character
LQ0   /

the LUT inputs would just take LQ0-18, turning the whole thing into a giant pixel counter that can generate signals for a whole frame without extra circuitry (besides the latches for the outputs and a clock)

to fetch a Character from RAM you would just take LQ3-9 (low) and LQ14-18 (high) as a 12-bit address and the LUT generated read signal.
to then get the actual pixels being drawn for the selected character you use LQ10-13 (low) plus the fetched Byte (high) from the previous step to get another 12-bit address plus the LUT generated read signal.
That read byte (or word for higher color depths) is then loaded into a shift register and shifted once every clock cycle.
you could use some tri-state buffers to get both 12-bit addresses onto the same bus at different cycles with a 13th address line which is either high or low depending on if the LUT is fetching a Character or Pixel Data, so it would all fit into a single 8kB RAM chip (maybe even dual port to simplify the CPU side of accessing VRAM).


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2024 6:08 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
It doesn't need such a high speed prom: it's clocked at character rate, not pixel rate. First thought is something like an SST39VF801C / SST39VF802C / SST39LF801C / SST39LF802C so anything faster than about 300nS would be quick enough. The hsync pulse happens to fall exactly on a character boundary, so there's no worry there.

Even running it as a monochrome graphics output doesn't need to be any quicker (and takes fewer parts - no external line counter) but of course requires a lot more video ram.

The need for the external line counter 163 is (a) because I want to include the control signals in the prom output and there are three bits taken up there and (b) because of the issue of moving from one character line to the next while accessing the same video memory location. It kills two stones with one bird.

It fails my usual complaint of course, of needing a programmer (I don't have one for 16 bit flash so I'll have to knock something up, but at least the zip sockets are cheap enough - about three euros on the bay)

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2024 7:13 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
First attempt. A number of issues including wrong prom type, signals with different names at each end, and it needs a way to latch the control signals, but it gives the idea.

Neil


Attachments:
6502_vga_prom-Line timing.pdf [260.96 KiB]
Downloaded 32 times
Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2024 3:29 pm 
Offline

Joined: Fri Mar 18, 2022 6:33 pm
Posts: 432
This is similar to how my video board works. It's an expanded version of George (gfoot's) "Worlds Simplest TTL Video Card."

_________________
"The key is not to let the hardware sense any fear." - Radical Brad


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2024 5:08 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 745
Location: Germany
barnacle wrote:
It doesn't need such a high speed prom: it's clocked at character rate, not pixel rate. First thought is something like an SST39VF801C / SST39VF802C / SST39LF801C / SST39LF802C so anything faster than about 300nS would be quick enough. The hsync pulse happens to fall exactly on a character boundary, so there's no worry there.

oh i see, i had to check GIMP (where i drew out a complete 640x480 frame and use the grid feature to display "character cells") and it does indeed land on exact character boundaries.

barnacle wrote:
The need for the external line counter 163 is (a) because I want to include the control signals in the prom output and there are three bits taken up there and (b) because of the issue of moving from one character line to the next while accessing the same video memory location. It kills two stones with one bird.

hmm, i still feel like it could be put into the PROM by just making it 24-bit wide. simplifying the BOM a bit.
then again, if it works it works. speaking of which have you already tried to build this in Digital or similar to see if it works?


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2024 5:42 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
I just noticed the timings on the vgatiming page were all divisible by eight :mrgreen:

Yes, it would work in a 24-bit wide prom (or more likely three 8-wide) at a cost of a more complex coding task. Swings and roundabouts...

I've tested the basic idea in Logisim with a much smaller prom, just to see if the 'scan 15 then 1' approach works. I need to look into Digital one of these days; it looks like it might solve some of the issues I have with Logisim. When I've got a program to generate the prom, I'll feed that into a bigger model and see how it looks.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2024 7:29 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
Just had a thought: it may be possible to fit all the necessary states into a somewhat smaller state prom. At the moment it's hopping around to other mostly empty chunks of prom as the various control signals become active, but if they're not fed back as an address to the prom then those bits are irrelevant to the state machine.

At 256 bytes for each character row, I'm looking at just under 8k of actual addresses, so there isn't room after them for the next 45 lines but maybe I can squeeze them in in the twenty otherwise unused bytes. Don't know yet, need to think about it. It'll simplify the software if I can.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 7:13 pm 
Offline

Joined: Sun Feb 22, 2004 9:01 pm
Posts: 78
It's definitely do-able, this is what I used for address, sync, and blanking signals for a 40x25 teletext display: TTXPROM.bas which generates this (sorry, I haven't finished writing that bit up).

_________________
--
JGH - http://mdfs.net


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 7:49 pm 
Offline

Joined: Sun Feb 22, 2004 9:01 pm
Posts: 78
barnacle wrote:
It doesn't need such a high speed prom: it's clocked at character rate, not pixel rate.

You don't even need to clock it at character rate. The shortest video signal is the front porch, which you can get to be 2 characters in an 80-character line, so you can clock as though you are using a 40-character line, and the front porch is 1 character wide; everything else is an even number of 80-character-widths, so is an integer number of 40-character-widths. You only need the 80-character addresses for fetching the actual character data, the address bus fed to the video signal encodinging circutry doesn't need A0, so "looks like" a 40-character character rate.

_________________
--
JGH - http://mdfs.net


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 25, 2024 8:57 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
An excellent point that I should have thought of; I was looking at the other end of the binary word. But doing it this way means you don't get the A0 bit generated by the state machine, so you need to pull it out elsewhere...

Neil


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 26, 2024 4:36 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
Even clocking at two-character cell intervals, it still needs slightly too much data to fit in 8k, so the control bits aren't isolated and it needs to hop around. But that's 640 by 480... 640 by 400 has a convenient 25 x 8 character cell, and only 499 lines (400 active). So reserving 64 bytes per line - only fifty are needed but the alignment works better this way - needs only 0x7cc0 bytes. So that'll fit into a 64k by 16 without having to play with the external line counter... the low bit is the clock itself. Oh, except it will probably need to be inverted. Minor detail.

Neil

edit: 8k by 16, or two 8k by 8, or any convenient mix.
edit 2: oops, no, I can't count. It's still one bit to long. Bother.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 07, 2024 9:44 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
So, after much swearing and changing my mind... I finally have a schema that should generate the addressing counts for a VGA 800x400 display (to be expressed as an 80x25 character cell of 8x16). It also outputs with no extra logic the line sync (negative going), field sync (positive going) and an active line/blanking signal (high when active).

It requires an 8k by 16 prom, a '163 counter, two '374 latches, half a '74, and a single inverter. Which is about eight chips fewer than my discrete counter version... seems to check out ok on Logisim.

I suspect that I shall use the Infineon S29AL008J which is an 8Mb chip, so way larger than I require, but are under two bucks each at Digikey. Add about the same for a zip socket all the way from China and it's a no-brainer if you don't fancy soldering tsop-48. The zip sockets are larger than the part, obviously, but not massively so, and I'm going to need two or three anyway, depending how I boot the processor. For the first attempt, likely one prom for the video addressing, one for the character set, and one for the software - so for at least two of those it'd be handy to be able to have a second go at programming them.

Attachment:
prom video generation.png
prom video generation.png [ 92.51 KiB | Viewed 1543 times ]


The code is a bit of a dog's breakfast, because to fit everything in I had to use a mix of compressed lines (which run the first line of a doublet fifteen times and then the '163 forces the address bit to run the second line) and uncompressed lines which can't extend past 0x20 to avoid clocking the '163 by accident.

A very dodgy bit of C to generate the values. Error checking? We've heard of that...
Code:
// generate contents of state machine to control vga
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
uint16_t prom[65536] = {0};      // zero everywhere to force a reset
// note: we use a 2-character window for timing, since that is the
// resolution of the signal itself
#define front_porch   40
#define hsync      41
#define back_porch   47
#define line_length   50         // count for last char in each segment
#define field_sync   0x2000      // active high, bit 13
#define line_sync   0x4000      // active low, bit 14
#define blanking   0x8000      // active low, bit 15
static uint16_t   prom_ptr;
FILE * fo;
void dump_binary (void)
{
   // output all of prom to file
   for (int q = 0; q < 65536; )
   {
      if (0 == (q % 64))
         printf ("\n[%04x] ", q);
      if (prom[q] == 0)
      {
         printf (".... ");
         q++;
      }
      else
         printf("%04x,",prom[q++]);
   }
}
void output_raw2 (void)
{
   // we only need the first 0x1fff
   FILE * fi;
   fi = (fopen ("prom.raw", "wt"));
   fprintf (fi, "v2.0 raw\n");
   for (int q = 0; q < 0x2000; )
   {
      if (0 == (q % 32))
         fprintf (fi, "\n");
      fprintf (fi, "%04x ", prom[q++]);
   }
   fclose(fi);
}
void main (int argc, char ** argv)
{
   uint16_t next_addr = 0;
   prom_ptr = 0;
   int q, r;
   // first we do the compressed lines
   // first entry is repeated 15 times, then moves to second
   // we have 25 of these pairs
   for (r = 0; r < 25; r++)
   {
      prom_ptr = r * 0x80;
      for (q = 0; q < line_length; q++)
      {
         next_addr = prom_ptr + 1;
         if (q == line_length - 1)
            next_addr =  0x80 * r;
         if (q < front_porch)
            next_addr |= blanking;
         if ((q < hsync) || (q >= back_porch))
            next_addr |= line_sync;
         prom[prom_ptr] = next_addr;
         if (q < (line_length - 1))
            prom[prom_ptr + 0x40] = next_addr + 0x40;
         else
            prom[prom_ptr + 0x40] = next_addr + 0x80;
         prom_ptr = next_addr & 0x1fff;
      }
   }    
   // finishes at 0x0c80
   
   // now we have 12 lines of vertical front porch
   // then two lines of vsync
   // we can't use any address from 0x..20 to 0x7f
   // so we use 00-20 and 80-92 for each line
   // (each line requires 256 addresses)
   prom_ptr = 0xc80;
   for (r = 0; r < 14; r++)
   {
      for (q = 0; q < 0x20; q++)
      {
         // first part of line
         next_addr = prom_ptr + 1;
         if (q == 0x1f)
            next_addr += 0x60;
         if (r > 11)
            next_addr |= field_sync;
         prom[prom_ptr] = next_addr |= line_sync; // blanking low, hsync high
         prom_ptr = next_addr & 0x1fff;
      }
      for (q = 32; q < line_length; q++)
      {
         // second part
         next_addr = prom_ptr + 1;
         if (q == line_length - 1)
            next_addr += 0x6e;
         if ((q < hsync) || (q >= back_porch))
            next_addr |= line_sync;
         if (r > 11)
            next_addr |= field_sync;
         prom[prom_ptr] = next_addr;   // blanking low
         prom_ptr = next_addr & 0x1fff;   // lose the control flag bits
      }
      prom_ptr = 0xc80 + ((r + 1) * 0x100);
   }
   // next space = 0x1a80
   // 35 more lines of field blanking, we can do two as 16-groups = 32
   for (r = 0; r < 2; r++)
   {
      prom_ptr = (r * 0x80) + 0x1a80;
      for (q = 0; q < line_length; q++)
      {
         next_addr = prom_ptr + 1;
         if (q == line_length - 1)
            next_addr =  (0x80 * r) + 0x1a80;
         if ((q < hsync) || (q >= back_porch))
            next_addr |= line_sync;
         prom[prom_ptr] = next_addr;   // blanking low;
         if (q < (line_length - 1))
            prom[prom_ptr + 0x40] = next_addr + 0x40;
         else
            prom[prom_ptr + 0x40] = next_addr + 0x80;
         prom_ptr = next_addr & 0x1fff;
      }
   }    
   // 0x1b80 next available
   // three lines to go, all blanking plus line sync
   prom_ptr = 0x1b80;
   for (r = 0; r < 3; r++)
   {
      for (q = 0; q < 0x20; q++)
      {
         // first part of line
         next_addr = prom_ptr + 1;
         if (q == 0x1f)
            next_addr += 0x60;
         if (r > 11)
            next_addr |= field_sync;
         prom[prom_ptr] = next_addr |= line_sync; // blanking low, hsync high
         prom_ptr = next_addr & 0x1fff;
      }
      for (q = 32; q < line_length; q++)
      {
         // second part
         next_addr = prom_ptr + 1;
         if (q == line_length - 1)
            next_addr += 0x6e;
         if ((q < hsync) || (q >= back_porch))
            next_addr |= line_sync;
         if (r > 11)
            next_addr |= field_sync;
         prom[prom_ptr] = next_addr;      // blanking low
         prom_ptr = next_addr & 0x1fff;   // lose the control flag bits
      }
      prom_ptr = 0x1b80 + ((r + 1) * 0x100);
   }
   // finally wrap up the cycle back to the start
   prom[0x1e11] = 0x0000 | blanking;
   printf ("prom_ptr = %04x, r = %d\n", prom_ptr, r);
   dump_binary();
   output_raw2();
}


Neil


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: