An increasing number of people get 6502/65816 systems to output composite video or VGA. I've suggested schemes to extend such functionality using a minimum number of components. The first of these is to have one set of counter chips and two or more sets of character ROMs. The intention is that the number of symbols can be vastly increased while the hardware is less than doubled. I've also suggested that dual displays, triple displays and suchlike may use the same set of counter chips. However, I've missed a really obvious extension which is likely to be more popular: sound. Specifically, PCM
audio.
The requirements for sound are vague but often aligned with the requirements for video. We want to output the maximum bit depth at the maximum rate possible - without overwhelming the processor. To push this further, many systems (Zoran's Vaddis IV chip, Atari Jaguar,
GameTank) allocate one processor to video and
one processor to sound. However, in the trivial case, we want to:
- Output one or more analog signals.
- Using a resistor network DAC.
- To obtain 1 volt peak-to-peak signals.
In the case of video, we want a horizontal scan-line frequency of 15.6kHz or more. In the case of
audio, we're not fussy but anything above 8kHz would be preferable. I strongly recommend against tying the frequency of one system to another. Indeed, many of the
difficulties of accelerating Commodore, Apple and Atari systems come from deeply inter-locked processor and video timing.
Ignoring that, if you want a quick win, 640*480 pixel VGA (with 9 bit counter) dove-tails really nicely with the default PATA/SATA/CF/MicroSD block size of 512 bytes. Furthermore, 31kHz horizontal sync is only moderately less than Compact Disc sample rate and would be an ideal default for 8 bit PCM
audio. As a quick hack, a double buffered, vertical blank interrupt video system may play sound at the rate of one disk block per frame. At 0.5KB per block and 60 frames per second, this is exactly 30KB of
audio per second. Double or triple this rate for 16 bit or 24 bit
audio and multiply that by the total number of channels. How far can we push it? BigDumbDinosaur's POC designs have exceeded 600KB/s. Therefore, 16 bit, 7.1 surround sound is possible on 6502/65816.
I take AndersNielsen's VGA implementation as a concrete example. It reads 64 bytes per horizontal line. One additional RAM chip using the same counters could send 64 bytes of
audio to one or more
audio DACs. This could be eights sets of
audio samples to eight DACs or four sets of samples to 16 DACs. Or a lesser quantity. I'll assume that hardware mixing of four samples is preferable because this only divides the maximum volume by a factor of four. In this arrangement, addressing of the
audio RAM is such that the bottom three or four bits correspond with DAC number and the next two or three bits correspond with the
audio "voice" which is being mixed.
However, it may be slow and inconvenient to populate to
audio RAM in the order that it is rastered to DACs. It is vastly preferable if 512 bytes of one channel can be populated from storage (or network). This is particularly true if the
audio buffers are bank switched and therefore not addressable at the same time. This is the conceptual trick. If samples of
audio are played at horizontal sync frequency then it is convenient to imagine
audio buffered as 64 "vertical stripes" within the
audio RAM.
Audio is played by rastering the
audio RAM left-to-right, top-to-bottom in the same manner as the video RAM. However, unlike the video RAM, windows into the
audio RAM are columns rather than rows.
It is obvious that it is possible to stripe
audio by byte and therefore it is possible to make a hardware/software interface which is upward and downward compatible with 8/16/24 bit
audio. Indeed, it works in a very similar manner to the optional accents with character generation - including use of the same bank latches. I call this voice mixing and 24 bit
audio compatibility
MAGPIE.
This is it. This is the solution.
Audio and video may use the same bank latches.
Audio and video RAM may be populated in the same manner. However, for
audio, the order of the address lines when writing data is different to the order of the address lines when reading data. In my preferred implementation, the least significant 6 bits - when playing samples - become the most significant 6 bits when bank selecting
audio RAM.
Audio is very much like video - with the major exception that address lines are shuffled somewhere between writing and rastering.
We have to handle the mundane issue of volume control. I considered doing this in software with a
caching multiplication algorithm. It is preferable to use a voltage multiplier with software and hardware volume control inputs. This is especially true if
audio will be used in conjunction with storage, video or network.
Audio output quality can be maximized by putting everything through one DAC and one volume multiplier. It is then possible to direct analog
audio (on its own power rail) via an analog switch to sample-and-hold circuitry where sampling occurs in the *middle* of the cycle.
For an encore, I extend the rather fluid concept of row (or column) to networking. Here, the requirements are more vague than
audio. The basic requirements for networking are:
- Nodes should be able to send data to each other.
- Don't design any stupid feature which hinders faster networking.
This is sufficiently vague to describe RS-485 but assume that we want something more like Manchester encoded, full-duplex, twisted-pair Ethernet as part of a combined Video/
Audio/Network card. In this case, one or more out-bound network channels can be multiplexed like
audio voices. This is fairly straight-forward and I was inclined to shadow everything under ROM addresses. Indeed, the clock stretching required to read from slow ROM is compatible with sending video,
audio or network output to a peripheral card. Then I considered reading from network. ("How's that gonna work?") We also have the problem of decoding wire formats, such as
Manchester encoding, as recommended by drogon. Encoding is easy. One byte becomes two bytes and this may be performed with two 256 byte tables.
For decode, I adapt one trick from the 6502 Forum's
Programmable Logic section. Specifically, Windfall's
Yet another (unnamed) 65C02 core stores even and odd bytes separately so that any 16 bit value can be retrieved with 8 bit granularity. If two or three 8 bit RAM chips are combined with a barrel shifter made from 74x157 chips, it is possible for any network input buffer to be de-skewed in hardware before the wire format is decoded in software. The occasional bit slip between hosts merely requires incrementing or decrementing through the available range of network buffer aliases.
So far, one bank latch value may represent:
- One write-only, unique line of video display.
- One write-only, unique "column" of audio samples.
- One write-only, unique network output buffer.
- One read-only, barrel shifted network input buffer.
With the addition of an
audio input, it is possible to implement 6502
audio conferencing over LAN with four remote users. While there are many examples in film, I was
specifically reminded of this possibility via the cheap and cheerful rendering of
Winx Club:
Magical Adventure (freely available in 2D on YouTube) where, co-incidentally, five characters talk about an absent friend. This is a demanding example of
audio which could be implemented with homebrew 6502 systems.
Four channels at telephone quality is approximately the same as one channel at 31kHz and this may be simplified if the hardware mixes the
audio streams. Admittedly, allowing users to join or leave a party-line is difficult - and a variable number of nodes will compound the pacing which is required to compensate for the mis-match in
audio sample speed. As a trivial example, consider
audio conferencing with two nodes. If the crystal oscillators are at different temperatures then one node will run out of samples to play while the other gets back-logged. After you think of a solution to this problem, extend it to cover three-way calling. Then keep going.
My preferred rate for initial network negotiation is 1/3 of double clocked 32.768kHz crystal. Starting from 65536kHz, three ticks may be used for 3x over-sampling. 65536kHz can be approximated by dividing 25.000MHz by 381 or dividing 25.175MHz by 384. I'd prefer to make the network compatible with 25.000MHz crystals commonly used in 400MHz USB2 or 800MHz USB3. However, 25.000MHz VGA is 0.5% too slow and may be incompatible with some monitors. I suppose we could obtain some portability by dividing 25.000MHz by 1000 and dividing 25.175MHz by 1007. However, GCD [Greatest Common Divisor] is 25kHz which is low for
audio conferencing. In the medium term,
audio and network will probably require clock domain crossing. In that case, VAN [Video,
Audio, Network] will all be running at different speed. That may appear to eliminate the reason for grouping them together. However, I'm working toward a generalized peripheral FPGA where all ports may be MicroSD, SNES, SP/DIF, LAN or video. In this general case, it is helpful to group the functions due to their lack of common frequency.