M65C02A Core
Re: M65C02A Core
It can move <transfer lenght> amount of bytes, isn't that a block move? Except that some block move operations will take care of overlap as well (looking at my minicomputer emulators right now - on one of the architectures a multi-byte copy is called just 'move' and the one which handles overlap is called 'bmove').
Re: M65C02A Core
I'm supposing that the non-interruptable block move has higher performance, but of course will hurt interrupt latency.
Re: M65C02A Core
Dr Jefyll wrote:
In one respect I found your choice of terminology odd. What you refer to as an interruptible bock move seems to me to be better described as simply a byte move. There's nothing "block" about it, unless I'm missing something. (I realize it updates the source, destination, count and sets the flags.)
Dr Jefyll wrote:
MichaelM wrote:
The IND prefix instruction is used to select the interruptable MOV instruction.
I may have eventually seen that the mode register could be used for the same purpose as I was using the IND flag register, but your observation allows the instruction to be used in a more cycle efficient manner. My solution had the effect of removing dummy cycles, but I was still left with the need to create an interruptable memory cycle. My initial solution met the 7 cycle transfer length objective, but still left me with the problem of trying to interrupt the instruction pipeline of the core. Your suggestion allows eliminating the IND prefix instruction from the loop, and following the MOV instruction with a NOP before the BNE $-5 conditional branch instruction. Thus, your suggested approach solves the remaining problem while maintaining the 7 cycle transfer loop that I was targeting. Many thanks for the suggestion.
Michael A.
Re: M65C02A Core
BigEd wrote:
I'm supposing that the non-interruptable block move has higher performance, but of course will hurt interrupt latency.
Nothing good is ever free.
OT: interrupt latency issues are a system design and implementation issue. IMO, unbuffered I/O devices should not be used unless absolutely necessary. Unbuffered I/O is the primary reason interrupt latency is such a concern in 8-bit systems like 6502-based systems. I tend to use buffered I/O devices in order to avoid the majority of interrupt latency issues. I even buffer event timers in some of my designs.
Michael A.
Re: M65C02A Core
Thanks for the explanation Michael. Just a thought: the early computers tended to have a DMA engine, or several, presumably because tying up the CPU to move data to or from devices was a waste, and an obstacle to time-sharing, and not as fast as a dedicated engine. In that approach, the DMA engine is autonomous and has a little state, but the state isn't part of the CPU state and so doesn't need to be preserved across interrupts or task switches. I wonder if it's worth considering that approach? The CPU needs to be able to tell whether the DMA engine is busy or idle, and in the limit it can spin in a conventional code loop waiting for idle: interrupts will look after themselves in that loop.
Re: M65C02A Core
MichaelM wrote:
Many thanks for the suggestion. 
Code: Select all
ldx # SrcAddress
ldy # DestAddress
lda # CountMinusOne
mov inc, inc, autorepeat ; <-----
;done -- all bytes moved nowCode: Select all
ldx # SrcAddress
ldy # DestAddress
lda # CountMinusOne
Mv_Lp: mov inc, inc, single ; <-----
bne Mv_Lp
;done -- all bytes moved nowAll I'm saying is, the designations "interruptable" and "noninterruptable" are not the best, and I believe better terminology can be found -- maybe "REP" or "autorepeat" or something like that. It's possible I'm being swayed by personal preference. However, I believe a newcomer to your design will more quickly understand the mov instruction if its options are carefully and descriptively named.
-- Jeff
[Edit: booboo in second example]
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: M65C02A Core
Dr Jefyll wrote:
All I'm saying is, the designations "interruptable" and "noninterruptable" are not the best, and I believe better terminology can be found -- maybe "REP" or "autorepeat" or something like that. It's possible I'm being swayed by personal preference. However, I believe a newcomer to your design will more quickly understand the mov instruction if its options are carefully and descriptively named.
I do like the REP or autorepeat identifiers. I am somewhat fond of REP, as in the REP prefix applied to the x86 INS and OUTS instructions. In defining the prefix instructions for the M65C02A, I really wanted to include a REP prefix.
One comment about your identifiers: the CountMinusOne should simply be Count. I decrement the transfer length during the data memory read cycle so that the condition code is valid during the following data memory write cycle. In this way the ALU flags are set on the instruction cycle that determines if another transfer is required or if the next instruction should be fetched and decoded.
Michael A.
Re: M65C02A Core
BigEd wrote:
Thanks for the explanation Michael. Just a thought: the early computers tended to have a DMA engine, or several, presumably because tying up the CPU to move data to or from devices was a waste, and an obstacle to time-sharing, and not as fast as a dedicated engine. In that approach, the DMA engine is autonomous and has a little state, but the state isn't part of the CPU state and so doesn't need to be preserved across interrupts or task switches. I wonder if it's worth considering that approach? The CPU needs to be able to tell whether the DMA engine is busy or idle, and in the limit it can spin in a conventional code loop waiting for idle: interrupts will look after themselves in that loop.
I think that the M65C02A core, using the COP #imm instruction to access coprocessors, would be able to support a DMA engine or I/O channel concept as used in computers such as the Data General Nova. To complete the M65C02A architecture, I will need include a coprocessor. Rather than a DMA engine, I will probably implement a Booth or DSP48A-based multiplier as my coprocessor example.
Michael A.
Re: M65C02A Core
I did wonder what the story was with memory contention in the early machines. As your machine is very efficient (not many idle cycles) the effect will be worse. For some machines, the answer was multi-banked memory. These days, we'd consider an instruction cache or small instruction prefetch buffer - which would be ideal if indeed the CPU is in a tight polling loop during the DMA. In your case, if your BRAMs are your memory, they are probably dual-ported and can be configured to be quite wide - possibly either of those could help a bit.
Re: M65C02A Core
BigEd wrote:
These days, we'd consider an instruction cache or small instruction prefetch buffer - which would be ideal if indeed the CPU is in a tight polling loop during the DMA.
Doing the same thing with a write-through data cache is probably overkill.
Re: M65C02A Core
For the current implementation of the M65C02A core there are only three instructions with dummy memory cycles: PHR (Push Relative Effective Address); PHW (Push 16-bit Word); and PLW (Pull/Pop 16-bit Word). That means that in the majority of circumstances, there are no free memory cycles.
For 6502-like processors, the best approach IMO to increasing memory bandwidth in an economical manner in order to provide free memory cycles for concurrent DMA operations is by creating a dual-ported memory where instructions are fetched on one bus and data on the other. Since the predominant number of memory cycles issued by 6502-like processors (or most other processors) are instruction memory fetches, the data port into the memory would be relatively free and could then easily support burst mode or cycle stealing DMA.
The M65C02A core does not contain a memory interface, interrupt handler, and peripherals. Those features are provided by the application-specific SOC implementation. Thus, the M65C02A core can support separate address spaces for instructions and data, if that is a desirable feature in a particular application. In its demonstration SOC configuration, the M65C02A uses a single, unified address space for instructions and data. Using the second port of the BRAMs, which are used to implement the SOC demonstrator's memory, could be used to support a DMA engine as a coprocessor or a peripheral function.
For 6502-like processors, the best approach IMO to increasing memory bandwidth in an economical manner in order to provide free memory cycles for concurrent DMA operations is by creating a dual-ported memory where instructions are fetched on one bus and data on the other. Since the predominant number of memory cycles issued by 6502-like processors (or most other processors) are instruction memory fetches, the data port into the memory would be relatively free and could then easily support burst mode or cycle stealing DMA.
The M65C02A core does not contain a memory interface, interrupt handler, and peripherals. Those features are provided by the application-specific SOC implementation. Thus, the M65C02A core can support separate address spaces for instructions and data, if that is a desirable feature in a particular application. In its demonstration SOC configuration, the M65C02A uses a single, unified address space for instructions and data. Using the second port of the BRAMs, which are used to implement the SOC demonstrator's memory, could be used to support a DMA engine as a coprocessor or a peripheral function.
Michael A.
Re: M65C02A Core
Even a small icache might be a big help - but as you suggest, that's not part of the core, it's part of the memory subsystem.
Another thought: in a single-tasking context, or a context where I/O isn't especially slow, a single instruction something like WAI could allow the CPU to quiesce while the DMA engine does its thing.
Another thought: in a single-tasking context, or a context where I/O isn't especially slow, a single instruction something like WAI could allow the CPU to quiesce while the DMA engine does its thing.
Re: M65C02A Core
With a 65C02-compatible core like the M65C02A, both the STP and WAI instructions are implemented. Your suggestion has merit, but implementing DMA while in a WAI-induced processor core sleep cycle provides no better performance than what can be achieved with a block MOV instruction. In fact, the MOV instruction makes efficient use of existing resources within the processor core. Using an external DMA engine will add significantly more hardware.
Michael A.
Re: M65C02A Core
Right, yes, of course - your uninterruptable MOV is more or less an in-CPU DMA operation. The tradeoff then would be between the interrupt latency of the MOV and the complexity of the standalone DMA engine. I've no issue with the choice you've made, just mulling over the various possibilities.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: M65C02A Core
BigEd wrote:
Another thought: in a single-tasking context, or a context where I/O isn't especially slow, a single instruction something like WAI could allow the CPU to quiesce while the DMA engine does its thing.
x86? We ain't got no x86. We don't NEED no stinking x86!