I really like your vision and design process. You also have a realistic timescale. I started a similar process more than three years ago and I'm only beginning to gain traction. One of the things I really like about your design process is the consideration of different form factors. I particularly like the clam shell designs and the portable music system. These are similar to my own inspirations, such as a
Totally Spies X-Powder or
magical music player. While
my influences are more similar to a Sony Discman, I see that you've had the same considerations regarding placement and function of buttons in a system where video display is optional.
My favorite design on the 6502 Forum (and in general) is
Eris. The reason for this is quite simple. In addition to packaging the circuitry between two sheets of laser cut plastic, it defines a robust filing system and is supplied with a comprehensive suite of software which has been adapted to work with an operating interface - and this is defined as a macro. The default macro allows parameters to be passed via a 16 bit stack which grows downwards from memory address $7FFF. This is flexible but slow. Thankfully, it is as simple as could be expected to find faster techniques. I hope this can be used to empirically find the fastest and most flexible operating system interface. Other favorites include
Planck for its generality (and idiot proofing),
6502GLD for its ruggedness and
ABN6502 for its compactness. However, one complaint that I have with the majority of hobbyist projects is that they lack the
Bauhaus School "form follows function". They look indistinct and they don't have a killer application. Whereas, numerous boxed Commodore inspired designs have been successful. This includes the COne, Mega65, Feonix and Commander X16. An Atari 800XL clone or successor may follow. Of particular note, the Commander X16 video system has been designed to implement NES style games and it is unsurprising that an unofficial port of MarioLand exists. If a 6502 design is packaged with more than two sheets of plastic then it has a fair chance of succeeding.
I've noticed a pattern of increasing frequency where someone develops an AVR/ARM/ESP handheld console system with buttons and a small screen. In the most ambitious case, thousands are given to tech conference attendees and a little "app store" is run for the duration of the conference. However, none of these arbitrary systems have gained traction. The most recent example I've seen - announcement in the Arduino section of Reddit - gained three comments and zero users. Educational systems are a greater waste of resources. I hoped that an educational system to teach spelling, read stories and similar might be useful. However, after attending my local trade toy exhibition, I've seen three heavily funded systems disappear over two years. Meanwhile, the Texas Instruments Speak 'N' Spell remains available with a larger screen, larger dictionary and better voice but is otherwise almost unchanged over more than 30 years. Damningly, we haven't advanced beyond
PLATO, Speak 'N' Spell and Teddy Ruxpin (a creepy device similar to Chucky and M3GAN). So, your consideration of a clam shell book reader and other forms is essential. One of these will be more successful than the others.
You seek input from the 6502 Forum but you should consider the members as kindred developers rather than customers. Your fellow developers have very opposing preferences. Some strongly prefer bare metal access. Some prefer privilege protection. Many prefer assembly. Some prefer BASIC, Forth or C. Preference for 65816 may be equally split - although I am perpetually amused that people who dislike 6502 decimal mode are more likely to prefer 65816 and run Forth on it. The amusing part is that canonical Forth defines a stateful decimal mode. Regardless, I think that you have the most affinity with
RadicalBrad and
BigDumbDinosaur.
Among many interests, RadicalBrad developed a discrete processor on 30 breadboards and 80MHz video system on another 30 breadboards before developing a 5 volt, 20MHz 6502 system with 32 I/O strobes. RadicalBrad has also used the SPI output of an 8 pin ATTINY85 microcontroller to output 25.175MHz VGA. This is openly disbelieved by many. You might want to investigate RadicalBrad's philosophy and methodology. RadicalBrad has spent winters off grid and produced phenomenal results. BigDumbDinosaur holds the record uptime for a 6502 SCSI system. BigDumbDinosaur has considerable experience within the logistic industry and is acutely aware of its
ad hoc requirements. This is used as a guide to develop a (possibly privileged) 65816 system primarily as reliable filing system, database and network hub for interpreted languages. BigDumbDinosaur is also one of our many musically talented members and may greatly appreciate 65816 with an electric guitar jack. Actually, I'm mildly surprised that your ideas haven't been savaged by BigDumbDinosaur. Consider this as a compliment.
Unfortunately, other members of the 6502 Forum continue the tradition of attacking newbies before they have "set out their stall". Welcome to the bear pit. While you've received some advice to use specific signals in specific circumstances, I advise:
- Read broadly.
- Gather requirements.
- Make a top-down specification.
- Make a bottom-up implementation.
Unfortunately, you have a combinatorial explosion of choices and this will only increase as more options become apparent to you. Given that you are focused on privilege protection, you will definitely require one bit of state - a privilege bit - external to a discrete processor or unprivileged FPGA core. You might find that discrete 65816 is detrimental because it draws more current and its bank switching is on the wrong side of the privilege bit. This may or may not be acceptable if 65816 is used in conjunction with other discrete 65xx peripheral chips. The reason for this may not be obvious to you. The 65xx peripheral chips typically accept multiple chip select signals. These may be used in multiple configurations. Historically, extra signals were used to reduce decode circuitry and access latency. More recently, extra signals have been used to distinguish idle bus cycles. Extra signals may also be used in conjunction with a privilege bit. Indeed, this may be applied selectively. One privileged 6522 may be reserved for MicroSD, banking switching and system timers while unprivileged chips allow fast I/O.
Historically, privilege violations have been handled within one bus cycle or within one instruction cycle. The major reason for cycle accuracy is virtual memory. However, modern systems deprecate virtual memory because it is too slow. Instead, memory is managed cooperatively. An application may receive an event to flush caches and a background application may be advised to serialize state. In this environment, privilege violation can be checked at the beginning of each system call. In this case, one bit of one register indicates privilege violation and the application is terminated before external state is affected. To elevate privileges, a doorbell to NMI is very fast. Indeed, it is even faster if NMI does not have to preserve flags or registers. NMI also acts as a global lock because no other execution occurs during this interrupt. Unfortunately, this type of singleton is unlikely to scale elegantly.
If downward compatibility with legacy software is not a concern then you could choose an unconventional memory map. Assuming 65816, it is possible to implement 257 banks. The EMU pin would select a legacy 64KB or native 16MB. In this arrangement, it is possible to place all I/O in the legacy or native memory map. Although, for your purposes, I'm not sure that either is an advantage. If the EMU pin is tied to clock speed then all access to legacy I/O is automatically slowed. Whereas, if all I/O is in the native memory map then one legacy application will be isolated from I/O. Unfortunately, if you want more than one legacy application, you may require a bank switching scheme in addition to the 65816's internal scheme. A third option would be placing slow I/O in the legacy memory map, fast I/O in the native memory map and using a separate privilege system. This is the type of choice combinatorial explosion that you'll encounter.
I've grappled with these choices and I've found it useful to
enumerate then reduce cases. I started wanting to support NMOS, CMOS, 65CE02, 65816 and my own extensions. This has reduced to the subset of opcodes which don't have bugs or undesirable features (NMOS decimal arithmetic, 65CE02 decimal subtraction, 65816 JMP (abs) via bank zero) and processor feature detection to allow optimized sections. I have also chosen a widely hated memory map in which 48KB RAM, 8KB I/O and 8KB ROM appears in every 65816 bank. Following
RadicalBrad's technique, I/O in every bank allows 2ns address decode (3ns at 3.3V) while allowing each application to optionally make Commodore or Acorn operating system calls. A slower, alternative address decode scheme allows use of Apple I/O at $C000-$CFFF concurrently with Commodore I/O at $D000-$DFFF. Yes, it precludes contiguous arrays larger than 48KB. However, it allows window manager and menus in one bank, desktop toys in another bank, filing system in another bank, text editor in another bank, compiler and assembler in another bank and application under development to run in another bank. Indeed, it is suitable for 63 compiled or assembled daemons/applications and a greater number of interpreted ones.
Commander X16 is a good example where "best of breed" choices have been combined. Most obviously, SNES joypads combined with Commodore operating system. However, this principle can be greatly extended. I discovered that
SNES protocol can be used for keyboard and mouse. With minor changes to Commander X16 memory map, it is possible for Acorn BASIC to work concurrently with SID audio and Apple cards. Likewise, use of the Commander X16's VERA is not mutually exclusive with use of LCD or OLED. For example,
visrealm's
HBC-56 is a 6502 system with 56 pin bus connected to LCD and TMS9918A NTSC composite video. (Commander X16's VERA has separate memory map, similar to TMS9918A.)
I'm very impressed that you found 800*480 pixel bitmap LCD. However, I don't think that you'll find the perfect controller from existing supplies. I've also had problems with LCD and video output and this has similarly led me to display agnostic design. It is not only hobbyists who are affected. The cost, size and quality of displays has become very commercially sensitive. Given the finite area of display manufacturing capacity, it is easy to tell if Apple is going up-market or down-market with each change of specification. It is also an outsized cost. When version 1 of the Apple Watch was released, several companies produced teardown reports and an estimated bill of materials. I was quite surprised to discover that a hobbyist could make a single unit for less money - if the display is omitted. The result would be a StarTrek Next Generation style communicator badge. Understandably, I investigated capacitive touch sensing and trunked audio using
cell networking. For a counter example, see Project Pluto where hobbyists replace the electronics of a Casio watch with 16 bit MSP430 microcontroller and accelerometer. Of particular note, the Casio LCD is typically preserved, although the cost of bespoke LCD is falling while capability is rising. Dave L. Jones of the EEVBlog has a short series of video where 100 units of bespoke passive LCD were designed with Inkscape and ordered from a manufacturer. Cost per unit was less than USD5.
I've sought circular, color OLED of approximately 100mm diameter. While smaller units are readily available, they are typically 300DPI, never less than 200DPI and never less than 8 bit per channel. For my purposes, 300DPI is more than 1 million pixels and more than 24 million bits of data to update the display. Understandably, a shape drawing language is typically used to send updates to the display. Otherwise, the volume of data to send is overwhelming. Unfortunately, most displays use the proprietary MIPI protocol and the cost of the specification is unaffordable to hobbyists. Even if we have access to the raw interface, I require 30 million pixel writes per second to smoothly update my circular display at 30Hz. You require about 40% of this for your rectangular display. (30Hz is a low standard and people now expect 60Hz or even 144Hz.) 65816 at 12MHz performing block copy can update 570000 24 bit pixels per second (19000 sequential pixels per frame at 30Hz) if it does nothing else. In particular, this excludes audio output. This covers almost 24 lines of your 480 line display. Even with a 24 bit blitter writing at 12MHz, 30Hz full screen animation is barely possible at 800*480. To further complicate matters, RadicalBrad recommends that video writes be eight times faster than display. This provides sufficient bandwidth for sprites and polygons to be composited.
A scheme is required which amplifies writes. Various techniques have been developed:
- HAM [Hold And Modify] modes are good for flat polygons and horizons.
- For textures, Quake's ModeX hack concurrently writes to four blocks of memory which are arranged on screen as four interleaved sets of pixels. Subsequent passes correct errors. However, this only works with write-back caching, write aggregation and deliberately blocky textures.
- Sprites allow clusters of pixels to be moved or animated with few writes. Surprisingly, some systems, such as NeoGeo, are sprite only. I've suggested that an 80 sprite system may also be an 80 column text system.
- Or perhaps 16*16 pixel characters overlaying a bitmap display. In this case, coarse updates for 256 pixels require much fewer writes while the remaining bandwidth can be used to update fine detail and provide sound.
Of course, the trick with using a text overlay for video is to find a good set of characters or implement them dynamically. A crude example of this is Bad Apple playing from counters and EEPROM onto the left of 16*4 character LCD (
Part 1 and
Part 2). This example does not use a microprocessor or similar but instead uses two tiers of ROM to expand common commands which are understood by the character LCD. It also uses the persistence of passive LCD to reduce apparent error. However, it uses a combination of predefined blocks and programmable blocks to approximate the silhouette of a standard video benchmark using an average data rate of approximately 5 bytes per frame. This type of technique can be used to greatly extend the reach of 8 bit computers displaying video.
A further consideration for implementing your own graphics controller is periodic interrupts. If the primary purpose of your device is to smoothly output video in any form then it is extremely helpful if the operating system or application knows when the next frame of video is required. This can be implemented using timer in 6522. However, this technique is subject to video tear if the timer drifts or does not start between frame updates. Furthermore, it is duplicated hardware and is a large, costly overkill when the overflow of the video row counter can be fed to the processor. For this reason alone, I envision you programming an FPGA LCD controller and avoiding CPLD. BigDumbDinosaur recently abandoned 65816 CPLD and this ignores the horrendous programming tools or energy consumption; which generally starts above 100mA for a blank device. Unfortunately, FPGA in production is generally 3.3V (or less). If you aim for 12MHz at 3.3V, this will be as challenging as 18MHz at 5V - something that BigDumbDinosaur achieved with POC 1.1 but not with POC 1.2 or POC 1.3. If you wish to use MicroSD, you will be forced to make some of your system run at 3.3V. It is only a matter of how much runs at 3.3V and for what reason.
There is a subtle problem with accompanying sound. I'm not a fan of FIFO but it would be required for high quality PCM audio. 6502 and 65816 can play audio on interrupt at 100kHz. However, 12MHz 6502 or 65816 playing audio at 48kHz receives interrupt every 250 cycles. Given that instructions take a variable number of clock cycles, there is jitter between interrupts and this changes the length of each audio sample played. With 7 cycle jitter every 250 cycles, audio quality may be reduced to 6 bits. A further problem is that 65816 interrupts in native mode require an extra cycle to save and restore the extra 8 bit of interrupted program address. This requires an extra 96000 clock cycles per second when playing 48kHz audio. That's 0.8% of the processor performance lost in native mode - in addition to 0.4% of performance per cycle of interrupt. Considerable processor performance and audio quality can be gained with an audio FIFO. Indeed, it is the only way to get sound above 10 bit quality.
It is maddening to find serial EEPROM is cheap and plentiful when parallel EEPROM is subject to shortages. Regardless, serial ROM is cheaper because 8 pin packages are cheaper. This is especially true for packages which are smaller than DIP. My plan is to make a system which boots from parallel ROM and provides a serial ROM
decompressor. This allows a system which is entirely hosted from ROM without the full cost of a fully parallel ROM implementation. While a decompressor has latency, it provides a differing set of features. Most notably, partial materialization of programs which may exceed 16MB. Others have tried various techniques but all have limitations.
drogon boots a microcontroller which populates RAM. This is fast but insecure. It also requires a co-processor which is more complex than the 65816 system, although it can be used for other purposes, such as UART and FPU. Some use CPLD to boot from Compact Flash. This is simple and the interface is fast. However, it uses too much energy for portable use, is insecure and the legacy stock of Compact Flash cards have limited storage capacity. Some obsolete 1 inch harddisks have a Compact Flash interface but this is even more restrictive given that less than 100 units may be available globally.
It is possible to interface serial ROM to 6502 or 65816 using CPLD or FPGA. The simple arrangement will be too slow because every read requires dozens of clock cycles while the serial read/write pointer is set before each byte of data is retrieved. Given that the circuitry is otherwise stateless, two such reads are required to obtain the vector for each interrupt and interrupt processing would be slowed horrendously. See
Bad Apple over SunPlus serial ROM interface for example slowness and note that vectors are not retrieved from such interface. Thankfully, FPGA offers more advanced techniques. In this arrangement, the programmable logic holds the processor in reset while it reads and permanently caches all vectors from serial ROM. The processor then starts and obtains the reset vector (or any other vector) without delay.
I can see you designing a conventional 65816/6522 system with parallel ROM. This would be very easy to repair. I also see you implementing 6502/6522 with your own bank switching. Or 65816 with FPGA programmed as LCD/VGA/HDMI/audio/MicroSD/serial ROM interface. Or perhaps FPGA only. Or perhaps Raspberry Pi Pico only. It is too early to tell.