In addition to what whartung already figured out:
+ there are two sets of interrupt vector tables, one for emulation mode, and one for native mode. Beside RESET (only emu-mode), BRK, COP, NMI, ABORT, and IRQ vectors (like the 65C816) there are 8 separate interrupt vectors provided for each (4) UART (one Rx, one TX), all 8 timers have a separate IRQ vector, one for this ominous parallel interface bus (PIB), and 6 more for some edge detections on various pins. These vectors could be beneficial when writing irq-driven IO-handlers. The vectors are fixed priority encoded.
+ each UART Tx empty interrupt can programmable occur if just the data register is empty or data and shift registers are empty. The latter is useful when driving an RS-485 bus. You can disable the driver after the last character is transmitted w/o using an additional timer or something else.
+ the on-chip RAM is divided into two parts: 512 bytes from 00:0000 up to 00:01FF. This range can be disabled (register SSCRx @ $00:DF41, bit 2). The remaining 64 bytes are within the I/O-space ranging from 00:DF80..00:DFBF. These bytes seemed to be always ever available and on-chip only.
+ using CS5B to address a 4MB RAM (00:0000..3F:FFFF) can be overridden by CS3B to select a
different 32KB RAM @ 00:0000..00:7FFF, but the lowest two pages are mapped to internal RAM unless its disabled! So there could be 3 different memories accessible from 00:0000..00:01FF and 2 different from 00:0200..00:7FFF. This may be useful as a separate "supervisor" storage, that could be swept in and out.
+ similar multiple memory is possible using CS4B and ROM enable (BCRx @ 00:DF40, bit 7). These 24KB + 8KB can be covered by CS5B as well.
+
there is one additional caveat using CS5B: during restart the ROM code (00:E109..) checks for a PCMCIA card with the "WDC" signature @ 00:8000
by using CS5B first! After that it checks for the signature again (same address) using CS4B. If there is RAM selected by CS5B it is highly unlikely that a signature would be found - but according to Murphy this will happen surely
+ there are two more memory regions preselected: CS0B (00:DF00..00:DF1F) (just enough for two VIAs) and CS1B (00:DFC0..00DFFF), 64 byte.
+ CS2B seems to have no real usage other than to indicate access to internal resources. It should be save using it as output only (all P7 is output only).
+ the SSCRx register (00:DF41) is used to define the accessing speed of various memory regions as well as to turn on/off a separate FCLK (Fast CLK) and the internal memory (s.a.). It is a bit unclear how the various possible settings are coexisting. But I assume setting bit 1 would cause all actions to slow down to CLK (which is usually a 32KHz clock), otherwise you can select individually the memory access speed for CS4B (00:8000..00:FFFF), CS5B (00:000..3F:FFFF), CS6B (40:0000..BF:FFFF), and CS7B (C0:0000..FF:FFFF) to run at FCLK or FCLK/4. The remaining memory can be clocked by FCLK or FCLK/4 depending on bit 3 of SSCRx. This might be helpful or necessary with additional I/O selected by e.g. CS0B or CS1B.
+ there are 8 timers each having 16 bit counters and 16 bit latches. The clock sources are dedicated (either FCLK or CLK) except for T4 (FCLK or Port 6 bit 0). Each have a separate IRQ vector. Each but T0 can be turned OFF and ON. T0 serves as a watchdog, once enabled you need to pet it otherwise it barks
+ there is astonishing much hardware spent for two tone generators TG0 and TG1. Each have a separate timer, can be en-/disabled, have individual pins (not usable otherwise), and they provide a 4 bit (16 level)
analog output signal. The intended use was for generating DTMF tones (requiring low harmonics).
+ NMI can be disabled
- it is explicitly not recommended to enabled ABORT and NMI - they share the same pin. But both can be disabled.
- there are only two timers (T3, T4) as clock source for 4 UARTs. And if you wishes to use T4 for counting or frequency generating there is only T3 left for all UARTs.
- the capabilities of timer T4 collides with UART0: P6.0 is RxD0 or T(4)IN (pulse counting), P6.1 is TxD0 or T(4)OUT (pulse generating)
- the capabilities of T7 (edge latching/ PWM measurement) collides with UART1: P6.2 is RxD1 or PWM (input)
- all edge sensitive inputs have individual interrupts, but all of them are mapped to otherwise used ports. Only P5.6 (pos. edge) and P5.7 (neg. edge) can be used (assuming the PIB is not used).
- the PIB (peripheral interface bus) is most likely an 8 bit data (port 5), 3 bit address (P4.5 (PIRS0), P4.6 (PIRS1), P4.7 (PIRS2)), and 3 bit control (P4.4 (PICSB or PIRDB), P4.3 (PIWEB or PIWRB), P4.2 (PIIB) interface to an other computer. The documentation is difficult to understand. The other computer is called "Host", the W65 is called "Processor" during the description of the various bits in the enable and flag register. Most likely the "idea" is as follows:
-- the W65 appears as an 8 byte wide peripheral within the I/O-space of an other computer (the "host").
-- the host can read/write all registers by using the select lines (PIRS0..2) and PICSB/PIRDB/PIWEB/PIWRB. (How the latter are used correctly I still haven't figure out).
-- usually the host would place its data in the upper four registers (that corresponds to PIR4..7 (00:DF7C..7F)), writing to PIR7 could cause an IRQ to the W65.
-- usually the W65 would write into PIR2 and PIR3. The latter write could automatically trigger PIIB to interrupt the host.
-- the automatic handshake could be turned on/off through various settings in PIBER
I'm not sure what timing specifications are applicable to this "bus". But I assume it was designed to run on an 80x86 as I/O port. So with 8 MHz for the W65 an access to these registers from outside within 200 or 250 ns should work.
-- the interface is somehow "passive" - this means you cannot connect two PIBs together to exchange data between two W65. You would need to map W65(A) PIB into the memory of W65(B) and vice versa - pretty ugly but somehow interesting