First steps...

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed May 28, 2014 12:44 am

cr1901 wrote:

BigDumbDinosaur wrote:

This is where pushing DB to the stack and temporarily loading it with bank $00 (or wherever the I/O hardware is located) helps out. In any 65C816 system with more than 64K, you are going to have to either use long addressing or be prepared to tinker with DB. It's unavoidable.

I probably should've been able to figure out to switch DB myself. In any case, I can see uses for both using long addressing and swapping DB. The former probably makes most sense for one-shot reads and writes, such as reading a status register or writing a single value. But the latter probably makes more sense to send blocks of data, i.e. with the ACIA, switch to bank 0, and use one of the complex addressing modes to access your data block wherever it is (bank 0, 1, 2,... etc).

In POC V2, which will have 512KB of RAM, I will have ROM and the I/O hardware visible in bank $00 only. The firmware I wrote for POC V1 uses mostly interrupt-driven I/O and will be largely transferred intact to V2. Hence the fact that I/O is in bank $00 is of little consequence, since any interrupt causes execution to revert to bank $00. However, DB is not changed by an interrupt, which is both a help and a hindrance.

By way of reference, POC V1's IRQ ISR starts thusly:

Code: Select all

;================================================================================
;
;iirq: HARDWARE INTERRUPT REQUEST SERVICE ROUTINE
;
iirq     phb                   ;save DB
         phd                   ;save DP
         longr                 ;16 bit registers
         pha
         phx
         phy
;
;———————————————————————————————
; Stack Frame Definition
;
irq_yrx  .= 1                  ;.Y
irq_xrx  .= irq_yrx+s_word     ;.X
irq_arx  .= irq_xrx+s_word     ;.C
irq_dpx  .= irq_arx+s_word     ;DP
irq_dbx  .= irq_dpx+s_mpudpx   ;DB
irq_srx  .= irq_dbx+s_mpudbx   ;SR
irq_pcx  .= irq_srx+s_mpusrx   ;PC
irq_pbx  .= irq_pcx+s_mpupcx   ;PB
;———————————————————————————————
;
         jmp (ivirq)           ;IRQ indirect vector
;
iirqa    longa                 ;ensure 16 bit accumulator
         ldaw kerneldp         ;set default...
         tcd                   ;kernel direct page
         shortr                ;8 bit registers
         lda #kerneldb         ;set default...
         pha                   ;kernel...
         plb                   ;data bank
;
;	—————————————————————————
;	IRQ priority: a) SCSI
;	              b) UART RxD
;	              c) UART TxD
;	              d) RTC
;	—————————————————————————
;
	...etc...

POC V1 only has bank $00, but I wrote the necessary bank switching mumbo-jumbo into the ISR for testing purposes. It, of course, will become necessary in POC V2.

Note the indirect jump through the IRQ vector. The address of that vector must be in bank $00. If it isn't, then JMP (IVIRQ,X) would have to be used, with .X set to $0000. However, that means that .X gets clobbered, which then means that if an extension to the ISR is intercepting the jump (i.e., "wedged" into the ISR) it has to get .X off the stack, which clobbers .C, since only the accumulator can use stack pointer relative addressing...

Data flow between the UART (DUART in POC V1) and its buffers is interrupt-driven—only the BIOS directly accesses the FIFOs. SCSI I/O uses interrupts for vectoring the SCSI driver foreground according to bus phase changes, but uses a monkey-rigged quasi-DMA process for actually reading or writing data. Reading or writing the real-time clock is done from the foreground. However, the RTC is responsible for generating the 100 Hz jiffy IRQ, which when serviced cause the uptime timer to increment and programmable time delay counter to decrement.

Foreground access on the DUART is for register setup following reset, and incidental accesses for device control. For example, when one of the channels interrupts due to the TxD FIFO being empty, the transmitter has to be shut down if the associated buffer is empty. The ISR takes care of shutdown in such a case. However, the foreground part of the UART driver has to restart the transmitter, which it does by writing a control value into the hardware.

Since the TIA-232 buffers are in kernel space, it makes sense to set DB to the kernel's bank, take of business and then restore DB to the entry value. Using long addressing would complicate things because the UART channels are all driven by the same piece of code, with indexing used to select the channel being accessed. I can't do that with long addressing, but I can by using the stack as ephemeral workspace. However, the stack pointer relative addressing modes act on the current data bank only.

Now, suppose I had a running kernel, loaded into a specific bank (e.g., bank $01), with kernel-specific data structures and TIA-232 buffers in the same bank, and disk buffers in bank $02. During a kernel call or when the kernel has to process an interrupt, it would have to set DB to $01 to access its specific data structures, or use [<dp>],Y addressing to get at them. However, during disk I/O, the kernel has to read and write in bank $02. So there is a natural conflict built into this that doesn't have a 100 percent satisfactory resolution. Either DB is constantly manipulated, 24 bit instructions must be used to reach data that is not in the current data bank, or 24 bit direct page addressing must be used to avoid tinkering with DB. In the case of disk buffer accesses, which are usually in fixed increments (e.g., 512 bytes), MVN and/or MVP can be used to shuffle bytes. However, it still ultimately involves fiddling with the bank in some fashion.

You need to change your thinking to accommodate the 65C816's way of doing things. I make extensive use of stack pointer relative addressing for ephemeral storage, thus avoiding having to dedicate direct page to I/O addressing. The result is a fully reentrant ISR that can handle interrupts nested to almost any depth. It's much different than the "traditional" 6502 way of doing things.

GARTHWILSON wrote:

I suspect that doing 24-bit addressing would be more efficient than constantly changing data banks when transferring data from I/O to memory or vice-versa.

It depends a lot on how I/O access is occurring. In a machine with no centralized operating system like your workbench computer, your application(s) may well be directly accessing the I/O hardware, in which case there's a tradeoff between use of 24 bit instructions and setting DB. I don't know the particulars of how your workbench machine handles I/O, so obviously I can't opine one way or another.

In a general purpose machine with a centralized operating system, it makes sense to let the OS handle the ugly details of working with the hardware, with the application(s) making API calls. In such a case, buffering would be used for much I/O and the transfer of data from the OS buffer to the application space could be made with MVN or MVP (MVN if the source and destination banks are different). Those instructions neatly jump the bank chasm, and copy at the rate of one byte per seven clock cycles, much faster than can be accomplished with the traditional looping method.

Another method is to use direct page indirect long addressing, which is like the example I mentioned a few messages ago. It's slower than MVN or MVP, but is better suited to cases where data comes in a byte at a time and must be transferred to the application in that fashion.

cr1901 · Post by **cr1901** » Fri May 30, 2014 11:12 pm

Parts have come in... sadly, I accidentally bought PIAs instead of VIAs, which are a bit less useful to me since VIAs have timers and can do everything a PIA can. I also bought a small amount of static RAM, and I have PLENTY of EPROMs for now.

Garth... re: your reply, I think the memory module will have to wait. I need to scale back my design slightly so I can get past Round 1. Right now, I'm waiting for the ACIA, VIAs, and a MAX232 (of course, I can always cheat and just buy a TTL cable). In the meantime, I'll sketch out a circuit design in my CAD program.

Perhaps appropriately, I've decided to name the system R1.

GARTHWILSON · Post by **GARTHWILSON** » Sat May 31, 2014 1:36 am

cr1901 wrote:

I accidentally bought PIAs instead of VIAs, which are a bit less useful to me since VIAs have timers and can do everything a PIA can.

The PIA is slightly confusing because bit 2 of the control registers CRA (which is at base address plus 1) and CRB (which is at base address plus 3) alter the function of base address and base address plus 2 to determine whether you'll read and write the ports or their data-direction registers.

The 6522 VIA has four register-select lines, giving it sixteen addresses, and the data-direction registers have their separate addresses which makes code more efficient particularly when you want to emulate an open-drain I/O bit like for bit-banging the popular I²C two-wire synchronous-serial interface.

And, as you say, the VIA is much more capable than the PIA.

Quote:

Perhaps appropriately, I've decided to name the system R1.

Next is R2(D2)?

cr1901 · Post by **cr1901** » Sat May 31, 2014 2:11 am

GARTHWILSON wrote:

cr1901 wrote:

I accidentally bought PIAs instead of VIAs, which are a bit less useful to me since VIAs have timers and can do everything a PIA can.

The PIA is slightly confusing because bit 2 of the control registers CRA (which is at base address plus 1) and CRB (which is at base address plus 3) alter the function of base address and base address plus 2 to determine whether you'll read and write the ports or their data-direction registers.

The 6522 VIA has four register-select lines, giving it sixteen addresses, and the data-direction registers have their separate addresses which makes code more efficient particularly when you want to emulate an open-drain I/O bit like for bit-banging the popular I²C two-wire synchronous-serial interface.

And, as you say, the VIA is much more capable than the PIA.

Quote:

Perhaps appropriately, I've decided to name the system R1.

Next is R2(D2)?

I'm sure I can find a use for the PIA, even if it's less capable... besides, it'll be a good learning exercise to figure out how to program it.

I saw that R2D2 reference coming from a mile away, to be honest

... SOMEone was going to make the connection, just didn't think it'd be that quick. In any case, the name is short, succinct, and it just feels right, considering I've been calling it Round 1, 2, 3, et al. since the beginning of this thread.

Right now, I'm constructing a circuit on a breadboard to the the '816 at DC and perhaps generate a clock signal using a microcontroller (probably MSP430 at 10kHz or so)... obviously, this won't be permanent, but I'd like to at least test the CPU tonight.

cr1901 · Post by **cr1901** » Fri Jun 20, 2014 8:55 am

Page 8 of Appendix A of the '816 Programming Manual has the following info on the VPB signal:

Quote:

Vector Pull
The VP’ signal is asserted whenever any of the vector addresses ($00:FFE4-FFEF, $00:FFF4-FFFF) are being accessed as
part of an interrupt-type service cycle. This lets external hardware modify the interrupt vector, eliminating the need for software
polling for interrupt sources.

Could someone smarter than me elaborate on the bold? I get the need to potentially switch interrupt vectors (for instance, use the vector in ROM if 0x00:0xFFE4-0x00:0xFFFF ROM has been swapped with RAM after initialization), but I'm not sure how VPB would eliminate polling for interrupts.

GARTHWILSON · Post by **GARTHWILSON** » Fri Jun 20, 2014 9:16 am

Jeff will have a great answer, but I'll jump in anyway. The VPB could be used with other logic to select a different part of memory depending on what caused the interrupt. Potentially you could swap not just the vector that is read, but even go to a different set of memory that wasn't even in the memory map before the interrupt. Then the program in the memory presented during interrupt service would do whatever the interrupting device needs, without having to ask who interrupted. Then RTI would put you back to the original memory that had different program material at the same address. Imagine this: the vector points someplace, say $F000, but there's a different routine there when it's time to service an interrupt, and what routine is there depends on who interrupted. There's no memory-moving necessary-- only selecting which memory will reside at that address. In that case, bank 0 could have far more than 64K altogether-- just not at the same time. The same goes for the whole 16MB space. (Jeff, I think you're getting us trained well!

)

BigEd · Post by **BigEd** » Fri Jun 20, 2014 10:07 am

"software polling for interrupt sources" is an odd phrase. What's meant is that the IRQ handler will typically need to interrogate each of several possible sources of interrupt to see which of them caused the IRQ. (It could be more than one, or in some circumstance none of them.) If some devices are likely to need more urgent attention (need lower latency service) then those devices would be checked first.

In a system which uses VPB or similar, the interrupting device can for example write some bits to a latch, and the latch can be read during the vector pull, so that some or all of the address bits fetched as the interrupt vector are defined by the interrupting device. For example, the service routine for UART0 could be E000, for UART1 E100, for UART2 E200, and so on. There are now several service routines, each dedicated to a specific device, so the code is simpler and faster (but, in total, bigger.)

See "Vectored Interrupt" for example at http://en.wikipedia.org/wiki/Vectored_Interrupt

Cheers
Ed

cr1901 · Post by **cr1901** » Fri Jun 20, 2014 10:26 am

Oh, so in effect using VPB with some address decoding serves the purpose of a scaled-down version of the Intel 8259 attached to the 8088 or a priority encoder attached to the 68000's interrupt inputs?

I'd imagine you'd need at least a priority encoder for this to work on the '816, because what if multiple devices try to interrupt/write to said latch simultaneously?

The truth bears repeating- the '816 has a number of underappreciated features... probably because it's arguably most well-known implementation in the SNES didn't need most of them.

BigEd · Post by **BigEd** » Fri Jun 20, 2014 10:45 am

Yes - I'd say VPB is minimal support for an external interrupt controller - it would be interesting to hear about any attempt to use it that way!

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Jun 20, 2014 6:45 pm

The VPB output could actually be used in several ways. As discussed, its intended purpose is to let external hardware know that the 65C816 is loading the program counter with the relevant interrupt vector. Sufficiently sophisticated logic could alter the apparent vector according to the source of the interrupt. As noted, a priority encoder would be necessary to determine which vector should be loaded. Discrete priority encoders tend to be slow devices (they have a lot of gates), so a CPLD or FPGA would be better suited to the task. Implementation of vectored interrupts could produce a big payoff in performance.

Interestingly, the NXP 28C94 quad UART I'm going to use in POC V2 has the ability to identify why it is interrupting by placing a bitwise code on the data bus when commanded to do so. With four communications channels, two counter/timer units and multiple I/O pins, the 28C94 can interrupt for a variety of reasons. So this feature could eliminate a lot of register polling. It's something I am going to be looking at once I get the unit working.

The other possible use for VPB is having system logic respond to it to cause a hardware context switch at the time of an interrupt. The 65C816 doesn't support the concept of user and supervisor modes as implemented in the Motorola 68000 MPU family, which makes the creation of a protected operating environment a challenge (see this Wikipedia article for some information on the concept). A sufficiently sophisticated programmable logic device could use VPB to switch the execution environment from user mode to kernel mode so that the interrupt handler can execute with more privilege than a user mode task. Typically, user mode processing would not be allowed to execute certain instructions, nor would the user mode task be able to access memory outside of a defined space.

There's some discussion about a protected environment here and here.

cr1901 · Post by **cr1901** » Fri Jun 20, 2014 9:45 pm

One other quick thing... page 48, Table 6-1 of the '816 datasheet shows a list of alternate mnemonics for instructions. The assembler I use (which I also have write-access to) implements most of them but omits the first three.

Adding the first two is trivial, but I've noticed that "CMP A" doesn't match the any valid instruction format in the datasheet (The accumulator is always implied as one of the two operands). Did WDC simply mean to say that "CPA" is a valid alternate mnemonic for "CMP" anywhere "CMP" is used? That would make the most sense, since it would match "CPX" and "CPY".

What do other assemblers that you have used do when dealing with alternate mnemonics?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Jun 21, 2014 4:29 am

cr1901 wrote:

One other quick thing... page 48, Table 6-1 of the '816 datasheet shows a list of alternate mnemonics for instructions. The assembler I use (which I also have write-access to) implements most of them but omits the first three.

Adding the first two is trivial, but I've noticed that "CMP A" doesn't match the any valid instruction format in the datasheet (The accumulator is always implied as one of the two operands). Did WDC simply mean to say that "CPA" is a valid alternate mnemonic for "CMP" anywhere "CMP" is used? That would make the most sense, since it would match "CPX" and "CPY".

CMP A is a typo that has been brought to WDC's attention several times. The correct syntax as originally devised by MOS Technology when the 6501 was released is CMP <operand>. I don't see any good reason to deviate from that. However, I supposed one could use CPA <operand>, although I'd have to say I wouldn't necessarily find that to be any more mnemonic than CMP <operand>.

Just about all of these mnemonics were intentionally "borrowed" by MOS Technology from the MC6800 assembly language, as one of the goals was to make it easy for 6800 programmers to transition to the 6502.

Quote:

What do other assemblers that you have used do when dealing with alternate mnemonics?

Up until fooling around with the Kowalski simulator, I never used an assembler that didn't conform to the MOS Technology standard. The Kowalski simulator's assembler uses INA and DEA in place of INC A and DEC A, and does not recognize the use of A as meaning the accumulator with ASL, LSR, ROL and ROR. If an instruction such as ASL A is used, the Kowalski assembler thinks that A is a symbol and halts with an undefined symbol error, unless of course, A has been defined to mean something.

There is a reason why A is used to represent the accumulator in instructions that can act on the accumulator or memory, but it seems that relatively few folks understand it. Call me a gruff old curmudgeon, but I am not much of a fan of assemblers that significantly deviate from the MOS Technology/WDC formal syntax. In my opinion, there's no good reason for doing so.

barrym95838 · Post by **barrym95838** » Sat Jun 21, 2014 5:22 am

BigDumbDinosaur wrote:

... There is a reason why A is used to represent the accumulator in instructions that can act on the accumulator or memory, but it seems that relatively few folks understand it. Call me a gruff old curmudgeon, but I am not much of a fan of assemblers that significantly deviate from the MOS Technology/WDC formal syntax. In my opinion, there's no good reason for doing so.

Well, I'm not as gruff and old as you, so I favor the format with which I learned 6502 assembly, shortly after I got my Apple ][+. Woz grouped the accumulator addressing mode with the implied, simplifying the disassembler and assembler:

Mike

Dr Jefyll · Post by **Dr Jefyll** » Sat Jun 21, 2014 6:58 am

BigDumbDinosaur wrote:

the NXP 28C94 quad UART I'm going to use in POC V2 has the ability to identify why it is interrupting by placing a bitwise code on the data bus when commanded to do so.

Just a clarification: when commanded to do so by driving a dedicated input, IACKN, low. Familiar devices such as the 6522 also are able to place a bitwise code on the data bus, but they have no dedicated input for the purpose; you need to read the Interrupt Flag Register in the usual way (set up the address & chip-select, etc). You could add hardware to drive the address & chip-select for this purpose, but the arrangement of bits in a 6522 IFR isn't very favorable -- it wasn't meant to be used that way.

cr1901 wrote:

I'd imagine you'd need at least a priority encoder for this to work

Not always. In a simple scenario with only two interrupt sources, you could easily arrange for two different values to appear in the interrupt-vector memory location, and you wouldn't need a priority encoder to manage such a simple decision -- namely, "Is it the more-important device, or isn't it?"

If you had, say, four devices, and you insisted that each had its own unique priority relative to the others, then a priority encoder would be very helpful in supplying one of four values to appear in the interrupt-vector memory location.

However, it's not always important that each of the four devices have its own unique priority!

Lots of systems have just one device that's "super important," and the rest aren't crucial -- they're all merely secondary to the important device. IOW I'm talking about a two-tier scheme, but populated by more than two devices. As noted above, you don't need a priority encoder to decide, "Is it the more-important device, or isn't it?"

In the event of a first-priority interrupt, you'd get the fast response you need -- and in the event of a secondary-priority interrupt, the ISR would use polling to identify the source. If this is an acceptable compromise, then you can eliminate the priority encoder.

cheers,
Jeff

cr1901 · Post by **cr1901** » Sat Jun 21, 2014 9:33 am

People may disagree, but the first 5 alternate mnemonics are more memorable to me than the "standard" mnemonics:
BLT aka Branch Less Than is easier to remember (for me) than Branch on Carry Clear.
Likewise for BGT aka Branch Greater Than as an alternate for Branch on Carry Set.
CMA, DEA, and INA are useful "counterparts" to the CPX, CPY, DEX, DEY, and INX, INY instructions.
The other mnemonics are less useful to me, but I can understand why they exist... who typically refers to the accumulator- in either mode- as "C" (although that is technically correct)?

First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...

Re: First steps...