16*16 Pixel Tile Video Display And Video Codec
Posted: Mon Dec 21, 2020 1:40 pm
Making video circuitry is currently outside of my ability. Regardless, there is a particular allure to doodling designs for video circuitry. I'd like a video display which displays 1920*1080p or significantly better and this would be particularly good if it only used cheap commodity components.
The first problem I identified is how to simultaneously display and write. This problem can be sidestepped but it typically reduces the throughput of screen updates by a factor of five. That is a significant problem when attempting to update two million pixels. It is preferable to do this while not being hindered by such a large multiple. Anyhow, the standard solutions are dual port RAM, two sets of RAM or fast RAM for the purpose of time slicing. Two sets of RAM is the cheapest option but it may require two sets of writes to keep display consistent. So, again, we may be hindered by a factor of two.
The second problem I identified is ripple carry from binary counters which leave an increasingly prominent skew on 2^P pixel transitions. How do people solve this problem? Latches on the counters to ensure consistent output? Perhaps it is preferable for an arrangement to not use latches. Instead, techniques to shorten counters may be preferable. Techniques include Chinese Remainder Theorem which uses co-prime modulos and space filling curves, such as Grey coding of address lines. I had much hope for Grey coding because it would be exceptionally useful to dump the contents of a RAM chip while only changing one address line per unit of output. In addition to serializing the contents of any horizontal scan line, this technique may also facilitate hardware scrolling. Unfortunately, the circuitry is unworkable. It is possible to get Grey coding working in "linear time" where G bits of Grey coding require logic gates with G-1 inputs. That's completely useless and it is preferable to use a binary counter. Regardless, this consideration of Grey coding found application elsewhere.
I continued with Chinese Remainder Theorem and found that the modulo counters could be incorporated into horizontal sync generation. (CRT for your CRT!) I also made an attempt to find a good palette. In particular, after finding:
my intuition differed. I tried the following on a Unix system with GIMP 2.x:
and then tested it against an extensive archive of skin-tones. I thought that SharedGreenBlue would produce the best results. I was wrong. SharedRedBlue produces the best result for light skin but 4Cubes produces the best result for dark skin and the most shades of grey. Anyhow, the best result is to allocate bits symmetrically and then share the remainder symmetrically.
I thought that I had a workable system until I found a comment from White Flame regarding a similar framebuffer:
I have seen similar sentiment in the influential DTACK Grounded newsletter, issue 40:
Ahh, hot dang! They are completely correct. A little processor cannot push around 120MB/s unaided. Even my planned VGA style ModeX hack to write through eight banks of RAM falls short. (Although, it simplifies most timings by a factor of eight.) I fear that we are "generals preparing for the last war" and that we are devising the prefect word processing system when the baseline specification is video conferencing.
It is possible to solve any computing problem with another level of indirection. For display, the standard techniques are a blitter or tiling (and sometimes both). Radical Brad and ElEctric_EyE have success with the former. 8BIT and Drass have success with the latter. I'd like to take a different path although much of what I propose is applicable elsewhere. Some of my over-confidence with processor design and circuitry comes from moderate success with codecs and streaming. In this case, I think that I know enough to specify video hardware and a codec which are tied to each other. I am also able to estimate quality of the output prior to implementation.
I'd like to make a self-hosting computer. I wish to use a broad definition in which a person unskilled in the art can learn the techniques of the computer's construction. So, my broad definition is the self-replicating memeplex of trustworthy computer. This could easily devolve into unscientific woo-woo if it did not include the scientific criteria of replication. For my purposes, it should be possible for the computer to play tutorial videos. However, this does not necessarily include the ability to encode video or play arbitrary codecs. Therefore, it is possible to discard all support for JPEG DCTs and motion deltas. So, it doesn't have to support H.261 as used in JPEG, MJPEG, MPEG or elsewhere.
After reading and implementing the annotated JPEG specification, I know more than any sane person should about a hierarchical, progressive, perceptual quantized, Huffman compressed, YIV color-space, 4:2:2 subsampled integer Discrete Cosine Transform with restart markers. I also know a little about JPEG2000 wavelet spirals and EXIF. And, indeed, after working with EXIF, my sanity is questionable. (For reference, it is against specification for an unescaped JPEG thumbnail to be embedded inside a JPEG. Good luck with enforcing that.)
For video, I initially considered a bi-level approximation of the most significant wave of an 8*8 DCT followed by RLE. This has advantages and disadvantages. Compression is poor but it requires zero bit shifts. This is acceptable for a computer which plays tutorial videos. Given that it only requires 64 tiles, this arrangement has the distinct advantage that it can be played in a window on a Commodore 64, at 60Hz while simultaneously displaying 7 bit ASCII, window decoration and other tiles. The quality is inferior to MJPEG although the exact amount is explicitly undefined. I assume that bi-level approximation is sqrt(2) worse than DCT. I also assume that a Commodore 64 has worse color-space than perceptual YIV. Perhaps there is some trick to improve color reproduction but I assume that it is incompatible with decompressing video.
In JPEG, color-space is often implicit. In MJPEG, perceptual quantize tables are often implicit. This makes quality comparison ambiguous. However, the most remarkable part is a comparison of multiple waves. The Joint Photographic Expert Group crowd-sourced the JPEG compression algorithm and image format. The committee's efforts include perceptual testing, a documented standard (sales of which are undermined by the chairperson's cheaper annotated version) and a reference implementation which is almost exclusively used in all implementations except Adobe's software. Indeed, the annotated version makes a pointed dig at Adobe for using up to 12 tiles in a macro-block when the standard specifies a maximum of 10. Oh, Adobe, even when you participate in defining a file format, you are unable to follow it.
Anyhow, JPEG perceptual quantize tables are only defined for one viewing distance, one viewing angle, one viewing brightness, one viewing contrast and - most significantly - individual waves. All other behavior is undefined. What is the quality of two waves in one tile? Undefined. This is the most incredulous part. For a standard which is concerned exclusively with the perceptual splitting, compressing and re-combining waves, the perceptual effect of such work is explicitly undefined. So, how much worse is the bi-level approximation of the most prominent wave? Undefined but, in practice, I expect it to be at least 40% worse.
So far, we have nothing above a demoscene effect. This is not the basis of a video tutorial format. So, how much further can we push this technique and how much retains compatibility with VIC-II hardware or similar? Actually, we can push it much further. I am also able to estimate how and where quality will suffer. Compared to JPEG, we are not using perceptual quantize tables, not using Huffman nor Arithmetic compression, not using YIV, not subsampling and only using an approximation of JPEG's 8*8 DCT. What happens if we throw that out too? Actually, we get increased throughput.
Anyone who understands Discrete Fourier Transform or the related Discrete Cosine Transform will understand that samples are converted to waves and waves are converted to samples. And, indeed, the number of samples and waves is fairly unchanged. For JPEG DCT, 64 samples become 64 waves and the most prominent ones are RLE/Huffman compressed or compressed with a technique which was slightly less frivolous than IBM's XOR cursor patent. Anyhow, we are only using 6 bits in a byte because we are representing 2^6 pixels. The obvious solution is to use 8 bits to represent 2^8 pixels. Therefore, the basic unit should be 16*16 pixels rather than the historical norm of 8*8 pixels. Historically, this hasn't been used because it requires four times as much processing power to encode and would skew the effectiveness of JPEG Huffman encoding. It also helps if video resolution is higher. And there is one further consideration. Think of those poor photographic experts, who were handed a working algorithm. They'd have to do four times more perceptual testing. Or possibly 16 if the result was practical. The horror!!! Think of the experts!!!!!
Not thinking in pixels is the crux of the idea. Rather than attempt to push 2MB of raw 8 bit pixels per frame at 60 frames per second (120MB/s), work exclusively in 16*16 waves. If we have, for example, 1024 tiles, we can display an arbitrary frame of the restricted video format concurrently with upscaled PETSCII and other tiles. By using multiple tiles, it is possible to display 16*32 characters for Roman, Greek and Cyrillic. It is also possible to display 32*32 CJK [Chinese, Japanese, Korean] and emoji. Unfortunately, there may not be enough tiles for arbitrary display. 1024 tiles is sufficient to concurrently display video, Roman script and window decoration but it may not be sufficient to display an arbitrary screen of Chinese. The upside is the ability to implement 1920*1080p using significantly less than one bit per pixel. Specifically, I suggest 16 bits or less for tile number, 16 bits or less for foreground color and 16 bits or less for background color. Excluding bi-level tile bitmaps, this is a maximum of 48 bits per 16*16 pixel tile. Tile bitmaps require exactly one bit per pixel. However, tile placements may exceed tiles by a factor of four or more.
In the most optimistic case, full screen 1920*1080p video in reduced to task of RLE decompressing 8100 bytes wave data, 8100 bytes of foreground data (with write through to least significant bits) and 8100 bytes of background data (also with write through). With this scheme, 2K video remains impossible for an unaided 2MHz 6502 but it is tantalizingly close. 640*360p is also a possibility. When decompressing 16 bit color, quality may exceed MJPEG but it is definitely inferior to the obsolete H.264 by perceptual quality and compression efficiency. Regardless, via the use of 8 bit LUTs [Look-Up Tables], it remains possible to render an inferior version on a Commodore 64. This is because 16*16 tiles can be mapped to 8*8 tiles.
I have previously attempted to transcode 3840*2160p video to my own format. I used a trailer for Elysium as test data. My conclusion is that everything looks awesome at 4K. Even when error metrics for 8*8 tiles were completely fubar, the result is excellent because there is a mix of 480*270 flat tiles and other encodings. This is the Mrs. Weiler's Law: "Anything is edible if chopped finely enough." Well, 120 columns of PETSCII with 8/16 bit color is definitely chopped finely enough.
A quick win to improve image quality and reduce bandwidth is hierarchical encoding. Specifically, I suggest concurrent use of 16*16, 32*32 and 64*64 tiles. I also suggest XOR of tiles at each scale prior to display. This allows selective addition or subtraction from a base color without incurring ripple carry. It also minimizes sections of dropout when a video is not played on target hardware. Hardware with support for two or more tile sizes allows small video to be upscaled by a factor of two or more while incurring no additional processor load. It also allows thumbnails to be played at lower quality. In particular, the top tier of a hierarchical video can be played on a Commodore 64 if it does not exceed 40 columns.
The hierarchical encoding process is similar to JPEG. Blocks of the broadest scale are processed first and residual data is processed in subsequent passes. This process is compatible with SIMD. If luma/chroma is conventionally summed, then thumbnails on any proposed hardware may incur dark patches in high detail areas. This would be particularly prominent on the lowest specification equipment. Although XOR incurs redundant impulse and worsens compression, it also provides the most graceful degradation. Where bandwidth and storage are not the priority, this atypical use is preferable. If this design choice is in error, it possible to fix in hardware by selectively incorporating XOR into a full adder. It is also possible to fix in software.
The astute may notice that large tile sizes are not compatible with the accepted list of high vertical resolutions. The result may be a strip of half or quarter tiles. Likewise, a tile hierarchy may place constraints upon user interface. This may include CJK on even columns only and windows which snap to a four column and four row grid. However, all of this is ancillary to the primary usage. The main purpose is mutually influenced hardware and software with the purpose of training people to make better hardware and software.
What is the quality of hierarchical, bi-level texture compression? You are probably using it already because most PCI and PCI Express graphic cards manufactured since 2005 support it. Indeed, fundamental patents regarding texture compression have subsequently expired. My technique to use the order of operands to share commutative opcodes is taken from a common texture compression format where the order of palette entries determines the compression technique. I would not be confident sharing my use of this technique with an active patent.
I have outlined a technique to obtain 1920*1080p video at one bit per pixel using the program counter of 65C02, 65816 or W65C265. This technique is compatible with hierarchical tiling and video decompression. These techniques are also compatible with processor stacking, blitter and various methods of DMA. I also note that my suggestion is compatible with work by White Flame, Radical Brad, ElEtric_EyE, 8BIT, Dras and others. I hope that ideas can be incorporated into discrete circuitry or FPGA. Indeed, I have been greatly inspired by an attempt to make a binary compatible extension to VIC-II; modestly and tentatively called VIC2.5. However, I believe that it is more beneficial to break binary compatibility in a manner which was typical at Commodore. I'm not the only one with such sentiment:
While The 8Bit Guy wants 640*480, the DTACK Grounded newsletter, issue 40 provides a long worked example of why this is not feasible on 8MHz MC68000 with 16 bit bus (or permutation thereof) while chiding the basic numeracy of people who should know better. The worked example would also explain why Apple Macintosh computers remained monochrome for an extended period. Timings for The 8Bit Guy's suggested 8MHz 6502 or 65816 may be equally stringent.
Regardless, my suggestion for a 16 bit tile number, 16 bit foreground color and 16 bit background color may be a pleasing compliment to the work on 65Org16 and, in particular, ElEtric_EyE's work with 65Org16 and video. While my preferred embodiment is much closer to the work of 8BIT and Drass, some of my diagrams have been shockingly similar to ElEtric_EyE's diagrams.
My outlined design may not be suitable for gaming due to excessive fringing. This could be corrected with multiple layer sets and alpha mask techniques. For example, it is relatively easy to specify 256 chords across a square tile. I believe this would also be compatible with ElEtric_EyE's work.
The first problem I identified is how to simultaneously display and write. This problem can be sidestepped but it typically reduces the throughput of screen updates by a factor of five. That is a significant problem when attempting to update two million pixels. It is preferable to do this while not being hindered by such a large multiple. Anyhow, the standard solutions are dual port RAM, two sets of RAM or fast RAM for the purpose of time slicing. Two sets of RAM is the cheapest option but it may require two sets of writes to keep display consistent. So, again, we may be hindered by a factor of two.
The second problem I identified is ripple carry from binary counters which leave an increasingly prominent skew on 2^P pixel transitions. How do people solve this problem? Latches on the counters to ensure consistent output? Perhaps it is preferable for an arrangement to not use latches. Instead, techniques to shorten counters may be preferable. Techniques include Chinese Remainder Theorem which uses co-prime modulos and space filling curves, such as Grey coding of address lines. I had much hope for Grey coding because it would be exceptionally useful to dump the contents of a RAM chip while only changing one address line per unit of output. In addition to serializing the contents of any horizontal scan line, this technique may also facilitate hardware scrolling. Unfortunately, the circuitry is unworkable. It is possible to get Grey coding working in "linear time" where G bits of Grey coding require logic gates with G-1 inputs. That's completely useless and it is preferable to use a binary counter. Regardless, this consideration of Grey coding found application elsewhere.
I continued with Chinese Remainder Theorem and found that the modulo counters could be incorporated into horizontal sync generation. (CRT for your CRT!) I also made an attempt to find a good palette. In particular, after finding:
Quote:
kc5tja on Sat 4 Jan 2003:
In 640 pixel mode, I have the following layout: RRmGGGBB. The RRm field forms a 3-bit red field. GGG forms a 3-bit green field. BBm forms a 3-bit blue field. Although three 3-bit fields normally creates a 512 color display, the 'm' bit (which stands for magenta) is shared between the red and blue channels, thus halving the number of colors actually viewable to 256. I've done some testing of this mechanism on my own computer using software simulation, and the initial results are quite nice going to do more testing course. But it gives a clean, 8-shades of grey, and discoloration of cyans and yellows isn't visible to my eye. I think I made a good compromise solution.
In 640 pixel mode, I have the following layout: RRmGGGBB. The RRm field forms a 3-bit red field. GGG forms a 3-bit green field. BBm forms a 3-bit blue field. Although three 3-bit fields normally creates a 512 color display, the 'm' bit (which stands for magenta) is shared between the red and blue channels, thus halving the number of colors actually viewable to 256. I've done some testing of this mechanism on my own computer using software simulation, and the initial results are quite nice going to do more testing course. But it gives a clean, 8-shades of grey, and discoloration of cyans and yellows isn't visible to my eye. I think I made a good compromise solution.
Code: Select all
$ perl -e 'print "GIMP Palette\nName: SharedRedBlue\n#\n";for($m=0;$m<2;$m++){for($r=$m;$r<8;$r+=2){for($g=0;$g<8;$g+=1){for($b=$m;$b<8;$b+=2){printf("%3i %3i %3i\n",(73*$r)>>1,(73*$g)>>1,(73*$b)>>1)}}}}' > SharedRedBlue.gpl
$ perl -e 'print "GIMP Palette\nName: SharedRedGreen\n#\n";for($m=0;$m<2;$m++){for($r=$m;$r<8;$r+=2){for($g=$m;$g<8;$g+=2){for($b=0;$b<8;$b+=1){printf("%3i %3i %3i\n",(73*$r)>>1,(73*$g)>>1,(73*$b)>>1)}}}}' > SharedRedGreen.gpl
$ perl -e 'print "GIMP Palette\nName: SharedGreenBlue\n#\n";for($m=0;$m<2;$m++){for($r=0;$r<8;$r+=1){for($g=$m;$g<8;$g+=2){for($b=$m;$b<8;$b+=2){printf("%3i %3i %3i\n",(73*$r)>>1,(73*$g)>>1,(73*$b)>>1)}}}}' > SharedGreenBlue.gpl
$ perl -e 'print "GIMP Palette\nName: 4Cubes\n#\n";for($m=0;$m<4;$m++){for($r=0;$r<4;$r++){for($g=0;$g<4;$g++){for($b=0;$b<4;$b++){printf("%3i %3i %3i\n",$m*17+$r*68,$m*17+$g*68,$m*17+$b*68)}}}}' > 4Cubes.gpl
sudo cp *.gpl /usr/share/gimp/2.0/palettesI thought that I had a workable system until I found a comment from White Flame regarding a similar framebuffer:
Quote:
White Flame on Tue 24 Jul 2012:
No, don't make a dumb framebuffer, especially not a 256-color one. The machine won't have enough bandwidth to push it around, and it'll be a programming and memory management nightmare. The Amiga in particular suffered for this in comparison to the C64 vs its peers, ending up being much less colorful and less animated than even underpowered machines like the SNES. You only want a framebuffer if you have a 3d accelerator, or maybe a very fast 2d blitter/shader pushing generic bytes around much faster than the CPU (ie, would allow you to redraw an entire game screen, object by object, at 60fps).
Multiple layers of tile-based graphics (with selectable palettes per tile) is the key to colorful, animated graphics being dynamically pushed around fast, as well as are easier to create & manage within the realm of this class of hardware. Regarding sprites, that's another place where Commodore went wrong, trying to include a smaller number of larger sprites. The more graphically successful arcade & home game consoles created their sprites from a larger number of smaller (usually 8x8 pixel) sprites, which ends up being far less constraining.
No, don't make a dumb framebuffer, especially not a 256-color one. The machine won't have enough bandwidth to push it around, and it'll be a programming and memory management nightmare. The Amiga in particular suffered for this in comparison to the C64 vs its peers, ending up being much less colorful and less animated than even underpowered machines like the SNES. You only want a framebuffer if you have a 3d accelerator, or maybe a very fast 2d blitter/shader pushing generic bytes around much faster than the CPU (ie, would allow you to redraw an entire game screen, object by object, at 60fps).
Multiple layers of tile-based graphics (with selectable palettes per tile) is the key to colorful, animated graphics being dynamically pushed around fast, as well as are easier to create & manage within the realm of this class of hardware. Regarding sprites, that's another place where Commodore went wrong, trying to include a smaller number of larger sprites. The more graphically successful arcade & home game consoles created their sprites from a larger number of smaller (usually 8x8 pixel) sprites, which ends up being far less constraining.
Quote:
FNE on Sun 10 Mar 1985:
One way to mix graphics and text is simply to use an exclusive-or gate to mix conventional 25-line, 80-column text with graphics. This requires the pixel rate of the text and graphics to be precisely synchronized, and does NOT permit text to be aligned on bit rather than character boundaries. It also leaves the text with a fixed size. We don't think most folks would consider such a system to be a bit-mapped system.
We believe that the most cost-effective solution to the text-cum-graphics problem at the moment is to use a dedicated text video circuit with nice, conventional 25 lines and 80 columns as the system output. This circuit should be memory-mapped like the Apple and Pet/CBM text screens, NOT at the other end of an RS-232 cable. Graphics then become optional, and require a second CRT.
One way to mix graphics and text is simply to use an exclusive-or gate to mix conventional 25-line, 80-column text with graphics. This requires the pixel rate of the text and graphics to be precisely synchronized, and does NOT permit text to be aligned on bit rather than character boundaries. It also leaves the text with a fixed size. We don't think most folks would consider such a system to be a bit-mapped system.
We believe that the most cost-effective solution to the text-cum-graphics problem at the moment is to use a dedicated text video circuit with nice, conventional 25 lines and 80 columns as the system output. This circuit should be memory-mapped like the Apple and Pet/CBM text screens, NOT at the other end of an RS-232 cable. Graphics then become optional, and require a second CRT.
It is possible to solve any computing problem with another level of indirection. For display, the standard techniques are a blitter or tiling (and sometimes both). Radical Brad and ElEctric_EyE have success with the former. 8BIT and Drass have success with the latter. I'd like to take a different path although much of what I propose is applicable elsewhere. Some of my over-confidence with processor design and circuitry comes from moderate success with codecs and streaming. In this case, I think that I know enough to specify video hardware and a codec which are tied to each other. I am also able to estimate quality of the output prior to implementation.
I'd like to make a self-hosting computer. I wish to use a broad definition in which a person unskilled in the art can learn the techniques of the computer's construction. So, my broad definition is the self-replicating memeplex of trustworthy computer. This could easily devolve into unscientific woo-woo if it did not include the scientific criteria of replication. For my purposes, it should be possible for the computer to play tutorial videos. However, this does not necessarily include the ability to encode video or play arbitrary codecs. Therefore, it is possible to discard all support for JPEG DCTs and motion deltas. So, it doesn't have to support H.261 as used in JPEG, MJPEG, MPEG or elsewhere.
After reading and implementing the annotated JPEG specification, I know more than any sane person should about a hierarchical, progressive, perceptual quantized, Huffman compressed, YIV color-space, 4:2:2 subsampled integer Discrete Cosine Transform with restart markers. I also know a little about JPEG2000 wavelet spirals and EXIF. And, indeed, after working with EXIF, my sanity is questionable. (For reference, it is against specification for an unescaped JPEG thumbnail to be embedded inside a JPEG. Good luck with enforcing that.)
For video, I initially considered a bi-level approximation of the most significant wave of an 8*8 DCT followed by RLE. This has advantages and disadvantages. Compression is poor but it requires zero bit shifts. This is acceptable for a computer which plays tutorial videos. Given that it only requires 64 tiles, this arrangement has the distinct advantage that it can be played in a window on a Commodore 64, at 60Hz while simultaneously displaying 7 bit ASCII, window decoration and other tiles. The quality is inferior to MJPEG although the exact amount is explicitly undefined. I assume that bi-level approximation is sqrt(2) worse than DCT. I also assume that a Commodore 64 has worse color-space than perceptual YIV. Perhaps there is some trick to improve color reproduction but I assume that it is incompatible with decompressing video.
In JPEG, color-space is often implicit. In MJPEG, perceptual quantize tables are often implicit. This makes quality comparison ambiguous. However, the most remarkable part is a comparison of multiple waves. The Joint Photographic Expert Group crowd-sourced the JPEG compression algorithm and image format. The committee's efforts include perceptual testing, a documented standard (sales of which are undermined by the chairperson's cheaper annotated version) and a reference implementation which is almost exclusively used in all implementations except Adobe's software. Indeed, the annotated version makes a pointed dig at Adobe for using up to 12 tiles in a macro-block when the standard specifies a maximum of 10. Oh, Adobe, even when you participate in defining a file format, you are unable to follow it.
Anyhow, JPEG perceptual quantize tables are only defined for one viewing distance, one viewing angle, one viewing brightness, one viewing contrast and - most significantly - individual waves. All other behavior is undefined. What is the quality of two waves in one tile? Undefined. This is the most incredulous part. For a standard which is concerned exclusively with the perceptual splitting, compressing and re-combining waves, the perceptual effect of such work is explicitly undefined. So, how much worse is the bi-level approximation of the most prominent wave? Undefined but, in practice, I expect it to be at least 40% worse.
So far, we have nothing above a demoscene effect. This is not the basis of a video tutorial format. So, how much further can we push this technique and how much retains compatibility with VIC-II hardware or similar? Actually, we can push it much further. I am also able to estimate how and where quality will suffer. Compared to JPEG, we are not using perceptual quantize tables, not using Huffman nor Arithmetic compression, not using YIV, not subsampling and only using an approximation of JPEG's 8*8 DCT. What happens if we throw that out too? Actually, we get increased throughput.
Anyone who understands Discrete Fourier Transform or the related Discrete Cosine Transform will understand that samples are converted to waves and waves are converted to samples. And, indeed, the number of samples and waves is fairly unchanged. For JPEG DCT, 64 samples become 64 waves and the most prominent ones are RLE/Huffman compressed or compressed with a technique which was slightly less frivolous than IBM's XOR cursor patent. Anyhow, we are only using 6 bits in a byte because we are representing 2^6 pixels. The obvious solution is to use 8 bits to represent 2^8 pixels. Therefore, the basic unit should be 16*16 pixels rather than the historical norm of 8*8 pixels. Historically, this hasn't been used because it requires four times as much processing power to encode and would skew the effectiveness of JPEG Huffman encoding. It also helps if video resolution is higher. And there is one further consideration. Think of those poor photographic experts, who were handed a working algorithm. They'd have to do four times more perceptual testing. Or possibly 16 if the result was practical. The horror!!! Think of the experts!!!!!
Not thinking in pixels is the crux of the idea. Rather than attempt to push 2MB of raw 8 bit pixels per frame at 60 frames per second (120MB/s), work exclusively in 16*16 waves. If we have, for example, 1024 tiles, we can display an arbitrary frame of the restricted video format concurrently with upscaled PETSCII and other tiles. By using multiple tiles, it is possible to display 16*32 characters for Roman, Greek and Cyrillic. It is also possible to display 32*32 CJK [Chinese, Japanese, Korean] and emoji. Unfortunately, there may not be enough tiles for arbitrary display. 1024 tiles is sufficient to concurrently display video, Roman script and window decoration but it may not be sufficient to display an arbitrary screen of Chinese. The upside is the ability to implement 1920*1080p using significantly less than one bit per pixel. Specifically, I suggest 16 bits or less for tile number, 16 bits or less for foreground color and 16 bits or less for background color. Excluding bi-level tile bitmaps, this is a maximum of 48 bits per 16*16 pixel tile. Tile bitmaps require exactly one bit per pixel. However, tile placements may exceed tiles by a factor of four or more.
In the most optimistic case, full screen 1920*1080p video in reduced to task of RLE decompressing 8100 bytes wave data, 8100 bytes of foreground data (with write through to least significant bits) and 8100 bytes of background data (also with write through). With this scheme, 2K video remains impossible for an unaided 2MHz 6502 but it is tantalizingly close. 640*360p is also a possibility. When decompressing 16 bit color, quality may exceed MJPEG but it is definitely inferior to the obsolete H.264 by perceptual quality and compression efficiency. Regardless, via the use of 8 bit LUTs [Look-Up Tables], it remains possible to render an inferior version on a Commodore 64. This is because 16*16 tiles can be mapped to 8*8 tiles.
I have previously attempted to transcode 3840*2160p video to my own format. I used a trailer for Elysium as test data. My conclusion is that everything looks awesome at 4K. Even when error metrics for 8*8 tiles were completely fubar, the result is excellent because there is a mix of 480*270 flat tiles and other encodings. This is the Mrs. Weiler's Law: "Anything is edible if chopped finely enough." Well, 120 columns of PETSCII with 8/16 bit color is definitely chopped finely enough.
A quick win to improve image quality and reduce bandwidth is hierarchical encoding. Specifically, I suggest concurrent use of 16*16, 32*32 and 64*64 tiles. I also suggest XOR of tiles at each scale prior to display. This allows selective addition or subtraction from a base color without incurring ripple carry. It also minimizes sections of dropout when a video is not played on target hardware. Hardware with support for two or more tile sizes allows small video to be upscaled by a factor of two or more while incurring no additional processor load. It also allows thumbnails to be played at lower quality. In particular, the top tier of a hierarchical video can be played on a Commodore 64 if it does not exceed 40 columns.
The hierarchical encoding process is similar to JPEG. Blocks of the broadest scale are processed first and residual data is processed in subsequent passes. This process is compatible with SIMD. If luma/chroma is conventionally summed, then thumbnails on any proposed hardware may incur dark patches in high detail areas. This would be particularly prominent on the lowest specification equipment. Although XOR incurs redundant impulse and worsens compression, it also provides the most graceful degradation. Where bandwidth and storage are not the priority, this atypical use is preferable. If this design choice is in error, it possible to fix in hardware by selectively incorporating XOR into a full adder. It is also possible to fix in software.
The astute may notice that large tile sizes are not compatible with the accepted list of high vertical resolutions. The result may be a strip of half or quarter tiles. Likewise, a tile hierarchy may place constraints upon user interface. This may include CJK on even columns only and windows which snap to a four column and four row grid. However, all of this is ancillary to the primary usage. The main purpose is mutually influenced hardware and software with the purpose of training people to make better hardware and software.
What is the quality of hierarchical, bi-level texture compression? You are probably using it already because most PCI and PCI Express graphic cards manufactured since 2005 support it. Indeed, fundamental patents regarding texture compression have subsequently expired. My technique to use the order of operands to share commutative opcodes is taken from a common texture compression format where the order of palette entries determines the compression technique. I would not be confident sharing my use of this technique with an active patent.
I have outlined a technique to obtain 1920*1080p video at one bit per pixel using the program counter of 65C02, 65816 or W65C265. This technique is compatible with hierarchical tiling and video decompression. These techniques are also compatible with processor stacking, blitter and various methods of DMA. I also note that my suggestion is compatible with work by White Flame, Radical Brad, ElEtric_EyE, 8BIT, Dras and others. I hope that ideas can be incorporated into discrete circuitry or FPGA. Indeed, I have been greatly inspired by an attempt to make a binary compatible extension to VIC-II; modestly and tentatively called VIC2.5. However, I believe that it is more beneficial to break binary compatibility in a manner which was typical at Commodore. I'm not the only one with such sentiment:
Quote:
The 8Bit Guy on Wed 11 Apr 2018:
Just like the C64 was not compatible with the VIC-20 or the PET or the Plus/4, total compatibility is not required. It just needs the feel. Also, it needs to use PETSCII characters.
Just like the C64 was not compatible with the VIC-20 or the PET or the Plus/4, total compatibility is not required. It just needs the feel. Also, it needs to use PETSCII characters.
Regardless, my suggestion for a 16 bit tile number, 16 bit foreground color and 16 bit background color may be a pleasing compliment to the work on 65Org16 and, in particular, ElEtric_EyE's work with 65Org16 and video. While my preferred embodiment is much closer to the work of 8BIT and Drass, some of my diagrams have been shockingly similar to ElEtric_EyE's diagrams.
My outlined design may not be suitable for gaming due to excessive fringing. This could be corrected with multiple layer sets and alpha mask techniques. For example, it is relatively easy to specify 256 chords across a square tile. I believe this would also be compatible with ElEtric_EyE's work.