VAN [Video, Audio, Network]
Posted: Wed Jun 01, 2022 4:18 pm
An increasing number of people get 6502/65816 systems to output composite video or VGA. I've suggested schemes to extend such functionality using a minimum number of components. The first of these is to have one set of counter chips and two or more sets of character ROMs. The intention is that the number of symbols can be vastly increased while the hardware is less than doubled. I've also suggested that dual displays, triple displays and suchlike may use the same set of counter chips. However, I've missed a really obvious extension which is likely to be more popular: sound. Specifically, PCM audio.
The requirements for sound are vague but often aligned with the requirements for video. We want to output the maximum bit depth at the maximum rate possible - without overwhelming the processor. To push this further, many systems (Zoran's Vaddis IV chip, Atari Jaguar, GameTank) allocate one processor to video and one processor to sound. However, in the trivial case, we want to:
Ignoring that, if you want a quick win, 640*480 pixel VGA (with 9 bit counter) dove-tails really nicely with the default PATA/SATA/CF/MicroSD block size of 512 bytes. Furthermore, 31kHz horizontal sync is only moderately less than Compact Disc sample rate and would be an ideal default for 8 bit PCM audio. As a quick hack, a double buffered, vertical blank interrupt video system may play sound at the rate of one disk block per frame. At 0.5KB per block and 60 frames per second, this is exactly 30KB of audio per second. Double or triple this rate for 16 bit or 24 bit audio and multiply that by the total number of channels. How far can we push it? BigDumbDinosaur's POC designs have exceeded 600KB/s. Therefore, 16 bit, 7.1 surround sound is possible on 6502/65816.
I take AndersNielsen's VGA implementation as a concrete example. It reads 64 bytes per horizontal line. One additional RAM chip using the same counters could send 64 bytes of audio to one or more audio DACs. This could be eights sets of audio samples to eight DACs or four sets of samples to 16 DACs. Or a lesser quantity. I'll assume that hardware mixing of four samples is preferable because this only divides the maximum volume by a factor of four. In this arrangement, addressing of the audio RAM is such that the bottom three or four bits correspond with DAC number and the next two or three bits correspond with the audio "voice" which is being mixed.
However, it may be slow and inconvenient to populate to audio RAM in the order that it is rastered to DACs. It is vastly preferable if 512 bytes of one channel can be populated from storage (or network). This is particularly true if the audio buffers are bank switched and therefore not addressable at the same time. This is the conceptual trick. If samples of audio are played at horizontal sync frequency then it is convenient to imagine audio buffered as 64 "vertical stripes" within the audio RAM. Audio is played by rastering the audio RAM left-to-right, top-to-bottom in the same manner as the video RAM. However, unlike the video RAM, windows into the audio RAM are columns rather than rows.
It is obvious that it is possible to stripe audio by byte and therefore it is possible to make a hardware/software interface which is upward and downward compatible with 8/16/24 bit audio. Indeed, it works in a very similar manner to the optional accents with character generation - including use of the same bank latches. I call this voice mixing and 24 bit audio compatibility MAGPIE.
This is it. This is the solution. Audio and video may use the same bank latches. Audio and video RAM may be populated in the same manner. However, for audio, the order of the address lines when writing data is different to the order of the address lines when reading data. In my preferred implementation, the least significant 6 bits - when playing samples - become the most significant 6 bits when bank selecting audio RAM. Audio is very much like video - with the major exception that address lines are shuffled somewhere between writing and rastering.
We have to handle the mundane issue of volume control. I considered doing this in software with a caching multiplication algorithm. It is preferable to use a voltage multiplier with software and hardware volume control inputs. This is especially true if audio will be used in conjunction with storage, video or network. Audio output quality can be maximized by putting everything through one DAC and one volume multiplier. It is then possible to direct analog audio (on its own power rail) via an analog switch to sample-and-hold circuitry where sampling occurs in the *middle* of the cycle.
For an encore, I extend the rather fluid concept of row (or column) to networking. Here, the requirements are more vague than audio. The basic requirements for networking are:
For decode, I adapt one trick from the 6502 Forum's Programmable Logic section. Specifically, Windfall's Yet another (unnamed) 65C02 core stores even and odd bytes separately so that any 16 bit value can be retrieved with 8 bit granularity. If two or three 8 bit RAM chips are combined with a barrel shifter made from 74x157 chips, it is possible for any network input buffer to be de-skewed in hardware before the wire format is decoded in software. The occasional bit slip between hosts merely requires incrementing or decrementing through the available range of network buffer aliases.
So far, one bank latch value may represent:
Four channels at telephone quality is approximately the same as one channel at 31kHz and this may be simplified if the hardware mixes the audio streams. Admittedly, allowing users to join or leave a party-line is difficult - and a variable number of nodes will compound the pacing which is required to compensate for the mis-match in audio sample speed. As a trivial example, consider audio conferencing with two nodes. If the crystal oscillators are at different temperatures then one node will run out of samples to play while the other gets back-logged. After you think of a solution to this problem, extend it to cover three-way calling. Then keep going.
My preferred rate for initial network negotiation is 1/3 of double clocked 32.768kHz crystal. Starting from 65536kHz, three ticks may be used for 3x over-sampling. 65536kHz can be approximated by dividing 25.000MHz by 381 or dividing 25.175MHz by 384. I'd prefer to make the network compatible with 25.000MHz crystals commonly used in 400MHz USB2 or 800MHz USB3. However, 25.000MHz VGA is 0.5% too slow and may be incompatible with some monitors. I suppose we could obtain some portability by dividing 25.000MHz by 1000 and dividing 25.175MHz by 1007. However, GCD [Greatest Common Divisor] is 25kHz which is low for audio conferencing. In the medium term, audio and network will probably require clock domain crossing. In that case, VAN [Video, Audio, Network] will all be running at different speed. That may appear to eliminate the reason for grouping them together. However, I'm working toward a generalized peripheral FPGA where all ports may be MicroSD, SNES, SP/DIF, LAN or video. In this general case, it is helpful to group the functions due to their lack of common frequency.
The requirements for sound are vague but often aligned with the requirements for video. We want to output the maximum bit depth at the maximum rate possible - without overwhelming the processor. To push this further, many systems (Zoran's Vaddis IV chip, Atari Jaguar, GameTank) allocate one processor to video and one processor to sound. However, in the trivial case, we want to:
- Output one or more analog signals.
- Using a resistor network DAC.
- To obtain 1 volt peak-to-peak signals.
Ignoring that, if you want a quick win, 640*480 pixel VGA (with 9 bit counter) dove-tails really nicely with the default PATA/SATA/CF/MicroSD block size of 512 bytes. Furthermore, 31kHz horizontal sync is only moderately less than Compact Disc sample rate and would be an ideal default for 8 bit PCM audio. As a quick hack, a double buffered, vertical blank interrupt video system may play sound at the rate of one disk block per frame. At 0.5KB per block and 60 frames per second, this is exactly 30KB of audio per second. Double or triple this rate for 16 bit or 24 bit audio and multiply that by the total number of channels. How far can we push it? BigDumbDinosaur's POC designs have exceeded 600KB/s. Therefore, 16 bit, 7.1 surround sound is possible on 6502/65816.
I take AndersNielsen's VGA implementation as a concrete example. It reads 64 bytes per horizontal line. One additional RAM chip using the same counters could send 64 bytes of audio to one or more audio DACs. This could be eights sets of audio samples to eight DACs or four sets of samples to 16 DACs. Or a lesser quantity. I'll assume that hardware mixing of four samples is preferable because this only divides the maximum volume by a factor of four. In this arrangement, addressing of the audio RAM is such that the bottom three or four bits correspond with DAC number and the next two or three bits correspond with the audio "voice" which is being mixed.
However, it may be slow and inconvenient to populate to audio RAM in the order that it is rastered to DACs. It is vastly preferable if 512 bytes of one channel can be populated from storage (or network). This is particularly true if the audio buffers are bank switched and therefore not addressable at the same time. This is the conceptual trick. If samples of audio are played at horizontal sync frequency then it is convenient to imagine audio buffered as 64 "vertical stripes" within the audio RAM. Audio is played by rastering the audio RAM left-to-right, top-to-bottom in the same manner as the video RAM. However, unlike the video RAM, windows into the audio RAM are columns rather than rows.
It is obvious that it is possible to stripe audio by byte and therefore it is possible to make a hardware/software interface which is upward and downward compatible with 8/16/24 bit audio. Indeed, it works in a very similar manner to the optional accents with character generation - including use of the same bank latches. I call this voice mixing and 24 bit audio compatibility MAGPIE.
This is it. This is the solution. Audio and video may use the same bank latches. Audio and video RAM may be populated in the same manner. However, for audio, the order of the address lines when writing data is different to the order of the address lines when reading data. In my preferred implementation, the least significant 6 bits - when playing samples - become the most significant 6 bits when bank selecting audio RAM. Audio is very much like video - with the major exception that address lines are shuffled somewhere between writing and rastering.
We have to handle the mundane issue of volume control. I considered doing this in software with a caching multiplication algorithm. It is preferable to use a voltage multiplier with software and hardware volume control inputs. This is especially true if audio will be used in conjunction with storage, video or network. Audio output quality can be maximized by putting everything through one DAC and one volume multiplier. It is then possible to direct analog audio (on its own power rail) via an analog switch to sample-and-hold circuitry where sampling occurs in the *middle* of the cycle.
For an encore, I extend the rather fluid concept of row (or column) to networking. Here, the requirements are more vague than audio. The basic requirements for networking are:
- Nodes should be able to send data to each other.
- Don't design any stupid feature which hinders faster networking.
For decode, I adapt one trick from the 6502 Forum's Programmable Logic section. Specifically, Windfall's Yet another (unnamed) 65C02 core stores even and odd bytes separately so that any 16 bit value can be retrieved with 8 bit granularity. If two or three 8 bit RAM chips are combined with a barrel shifter made from 74x157 chips, it is possible for any network input buffer to be de-skewed in hardware before the wire format is decoded in software. The occasional bit slip between hosts merely requires incrementing or decrementing through the available range of network buffer aliases.
So far, one bank latch value may represent:
- One write-only, unique line of video display.
- One write-only, unique "column" of audio samples.
- One write-only, unique network output buffer.
- One read-only, barrel shifted network input buffer.
Four channels at telephone quality is approximately the same as one channel at 31kHz and this may be simplified if the hardware mixes the audio streams. Admittedly, allowing users to join or leave a party-line is difficult - and a variable number of nodes will compound the pacing which is required to compensate for the mis-match in audio sample speed. As a trivial example, consider audio conferencing with two nodes. If the crystal oscillators are at different temperatures then one node will run out of samples to play while the other gets back-logged. After you think of a solution to this problem, extend it to cover three-way calling. Then keep going.
My preferred rate for initial network negotiation is 1/3 of double clocked 32.768kHz crystal. Starting from 65536kHz, three ticks may be used for 3x over-sampling. 65536kHz can be approximated by dividing 25.000MHz by 381 or dividing 25.175MHz by 384. I'd prefer to make the network compatible with 25.000MHz crystals commonly used in 400MHz USB2 or 800MHz USB3. However, 25.000MHz VGA is 0.5% too slow and may be incompatible with some monitors. I suppose we could obtain some portability by dividing 25.000MHz by 1000 and dividing 25.175MHz by 1007. However, GCD [Greatest Common Divisor] is 25kHz which is low for audio conferencing. In the medium term, audio and network will probably require clock domain crossing. In that case, VAN [Video, Audio, Network] will all be running at different speed. That may appear to eliminate the reason for grouping them together. However, I'm working toward a generalized peripheral FPGA where all ports may be MicroSD, SNES, SP/DIF, LAN or video. In this general case, it is helpful to group the functions due to their lack of common frequency.