65VM02

GARTHWILSON · Post by **GARTHWILSON** » Sat May 13, 2017 8:32 am

At the tiny company I started working at in 1985, there was initially one computer in the whole company, an Altair with an 8088 IIRC, running business BASIC, and it serviced the sales people's WYSE terminals. I believe it had one hard disc, probably 10 or 20MB, and it got backed up to streaming tape at the end of every day. I'm sure it had a lot of interrupts; but none of them required service any faster than maybe a significant fraction of a second. (100ms? I don't know. Definitely not 1µs or less!) I've heard a lot more complaints from office people in later years about how slow the computers were when they were Pentiums connected to the network.

BigEd · Post by **BigEd** » Sat May 13, 2017 10:38 am

Acorn's Beeb also has many interrupt sources.

Tor · Post by **Tor** » Sat May 13, 2017 12:50 pm

ARM was never sold to America - there was a group of three with ownership in ARM at one time, one of them was Apple. When Jobs came back he sold those shares so that he had the cash to go forward - that money "made" the modern Apple company.
After that ARM became ARM Holding, and stayed in the UK forever after. Well, until they sold out to Softbank last year, a Japanese company (with strong Chinese backing).

Hugh Aguilar · Post by **Hugh Aguilar** » Sat May 13, 2017 4:16 pm

GARTHWILSON wrote:

Hugh Aguilar wrote:

BTW: There is a bug in CMOVE --- I'm surprised that none of you 6502 have pointed that out yet.

Care to elaborate? I've never seen the bug; but not long into my 6502 Forth experience, I saw that I was virtually never moving a whole page or more, so I wrote a shorter, faster version and called it QCMOVE, the "Q" being for "quick," and I use that for moving less than a page, like for moving strings. (I still left CMOVE there, but seldom if ever use it.)

This is my new-and-improved CMOVE function (notice how SBY speeds things up):

Code: Select all

CMOVE:                  ; this is:  CMOVE ( src dst cnt -- )
    LDY roslo,X
    LDA roshi,X
    STYA src
    LDY soslo,X
    LDA soshi,X
    STYA dst
    LDY toslo,X         ; Y is the count, with the high-byte still in toshi,X
    LDA toshi,X
    TST
    BEQ DROP3           ; if the count is zero, don't even begin
    DEY                 ; because we need an offset to the last element in the array (rather than past the array)
CMOVE_BEGIN:
    LDA (src),Y
    STA (dst),Y
    SBY #1              ; clears C-flag when Y underflows (Y becomes $FF)
    BCS CMOVE_BEGIN     ; loop until Y is $FF
    LDA toshi,X  
    SEC
    SBC #1              ; clears C-flag when high-byte underflows (high-byte becomes $FF)
    BCC DROP3           ; loop until high-byte is $FF
    STA toshi,X
    INC src+1
    INC dst+1
    BRA CMOVE_BEGIN

; On the 65c02, DEC doesn't set the C-flag (one of the many flaws in the 65c02 design).
; Because of this, we have to LDA TOSHI,X then subtract 1 then STA TOSHI,X again.
; I could fix this flaw in the 65VM02, but that might break legacy code.

edit: I still had a bug! My tests of the C-flag were all backwards! Ugh!
I need to write a simulator so I can test this code --- it is very difficult to write code correctly without testing.
This also points out why a byte-code VM is useful --- writing code in assembly-language is error-prone --- it is better to write application programs in a high-level language.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat May 13, 2017 10:33 pm

Arlet wrote:

Quote:

That would not describe most 6502 systems. Using the Commodore 128 as an example, it had exactly two possible IRQ sources: the VIC and CIA #1. In C-64 mode, disregarding specific programming to the contrary, only the CIA is an IRQ source. In C-128 mode, only the VIC is an interrupt source, unless other arrangements are made. Hardly a good case for a vectored interrupt system.

On the other hand, the Z80 business systems had a lot more peripherals with interrupts, including multiple floppy disks, serial ports, real time clocks, and even built-in modems.

They did, as did the first IBM PCs. Even there, IBM chose to use separate hardware to organize the interrupt sources (eight of them in the original design). Ultimately, that proved to be a wise engineering choice, and evidently echoed mainframe engineering philosophy. As the need for more interrupts arose with the development of the XT and later machines, the MPU's interrupt capabilities didn't have to change, just the programmable interrupt controller (PIC).

The fundamental problem with wiring-in the interrupt vectoring into the MPU is it is always there, whether wanted or not. Whatever savings may exist in terms of clock cycles not spent looking for IRQ sources are expended in the MPU's latency, which is something not under the control of the system programmer. If only one or two IRQ sources are actually enabled, the MPU is doing useless busy-work when it responds to an IRQ.

Continuing with the 6502 architecture (which is what this forum is all about), the WDC 65C02 and the 65C816 compromise by providing an output (VPB) that can tell external hardware when it is fetching the interrupt vector. The system designer can chose to use VPB or not, with no compromise in the MPU's interrupt latency performance.

Relatively few 65xx systems have enough IRQ sources to justify a priority interrupt encoder along the lines of the x86's PIC. However, thanks to programmable logic, it's not at all difficult to make such a facility available, and at virtually no cost in MPU interrupt latency. My POC V2.1 design is pushing this IRQ thing a bit—each DUART has eight possible interrupt sources, of which five will be continuously enabled. This state of affairs will likely lead to the development of version 2.2 with enough logic to do priority interrupt encoding, complete with an on-the-fly vector setup.

Hugh Aguilar · Post by **Hugh Aguilar** » Sun May 14, 2017 12:53 am

BigDumbDinosaur wrote:

The fundamental problem with wiring-in the interrupt vectoring into the MPU is it is always there, whether wanted or not. Whatever savings may exist in terms of clock cycles not spent looking for IRQ sources are expended in the MPU's latency, which is something not under the control of the system programmer. If only one or two IRQ sources are actually enabled, the MPU is doing useless busy-work when it responds to an IRQ.

My 65VM02 is intended to be fully 65c02 compatible, so it can support legacy code. I added the VIRQ interrupt in this latest document, but I still have IRQ and NMI that are exactly the same as before.

I doubt that VIRQ has any more interrupt latency than IRQ. That busy-work you mentioned (setting A with the index for use in JVM) is done in hardware, so it is instantaneous. If there is only one interrupt source, then the ISR can ignore A and not bother with the JVM. As for saving A to the return-stack, which is done in VIRQ but not in IRQ, almost every ISR is going to do this manually --- Garth mentioned a few cases in which the ISR didn't need to use A --- I would expect most ISRS would use A though.

Mostly, VIRQ has the cost of using more hardware resources than IRQ --- on a modern processor though, this extra cost can be afforded.

It would be possible to discard useless features for the purpose of saving resources, and this might be done in a small version of the 65VM02. The TSX instruction can be discarded because I have LLYA now. TAX TXA PHX and PLX can all go. LDX and STX are needed for the direct addressing-mode, but they can be discarded for all other addressing modes. This small version might also fix problems in the 65c02, such as the INC and DEC not setting the C-flag like ADC (see my work-around in CMOVE above). The full version however is fully 65c02 compatible so it will run legacy programs. At this time, I'm focusing on the full version --- I would only try a small version after the full version is done, and only if this could reduce the cost of the chip (which is dubious).

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun May 14, 2017 1:41 am

Hugh Aguilar wrote:

I doubt that VIRQ has any more interrupt latency than IRQ. That busy-work you mentioned (setting A with the index for use in JVM) is done in hardware, so it is instantaneous.

Nothing that occurs in an MPU (or any kind of silicon logic) is instantaneous. The internal gates in the device, regardless of how small the geometry to which the device was fabricated, have finite propagation time, which sets a very hard limit on how fast things can occur. In this case, prior to loading register "A" (an accumulator?) with an index, you have to save that register's content somewhere, e.g., on a stack, and that too takes time. These two steps have to occur sequentially, not simultaneously, in step with the clock. So, no, it will not be instantaneous.

Quote:

If there is only one interrupt source, then the ISR can ignore A and not bother with the JVM. As for saving A to the return-stack, which is done in VIRQ but not in IRQ, almost every ISR is going to do this manually --- Garth mentioned a few cases in which the ISR didn't need to use A --- I would expect most ISRS would use A though.

True, but in "traditional" ISR programming it is the programmer's discretion as to how processing might proceed. As soon as you wire-in assumptions into the MPU itself you take that discretion away from the programmer.

Incidentally, I have written ISRs in which only .X and .Y were used and the accumulator never touched.

Quote:

It would be possible to discard useless features for the purpose of saving resources, and this might be done in a small version of the 65VM02...This small version might also fix problems in the 65c02, such as the INC and DEC not setting the C-flag like ADC...

What you are describing is not a "problem." INC and DEC are not arithmetic operations, which is why they, by design, do not affect carry. That also applies to INX, INY, DEX, DEY and on the 65C02 and 65C816, INC A and DEC A. I think you need to dig a little deeper into fully understanding the 65C02 architecture before cavalierly dismissing something as a design problem.

Hugh Aguilar · Post by **Hugh Aguilar** » Sun May 14, 2017 3:40 am

BigDumbDinosaur wrote:

Hugh Aguilar wrote:

I doubt that VIRQ has any more interrupt latency than IRQ. That busy-work you mentioned (setting A with the index for use in JVM) is done in hardware, so it is instantaneous.

Nothing that occurs in an MPU (or any kind of silicon logic) is instantaneous. The internal gates in the device, regardless of how small the geometry to which the device was fabricated, have finite propagation time, which sets a very hard limit on how fast things can occur. In this case, prior to loading register "A" (an accumulator?) with an index, you have to save that register's content somewhere, e.g., on a stack, and that too takes time. These two steps have to occur sequentially, not simultaneously, in step with the clock. So, no, it will not be instantaneous.

My experience on the MiniForth was that quite a lot got parallelized and was effectively instantaneous. The processor could pack up to 5 instructions into a single opcode, all of which would execute in a single clock cycle. My assembler would rearrange the instructions in order to pack them together efficiently (with minimal NOP instructions needing to be inserted), while yet guaranteeing that the program did the same thing as if the instructions were compiled one per opcode in the same order that they appeared in the source-code.

On the MiniForth, all of NEXT got parallelized every time, so it took zero clock cycles.

My 65VM02 design is pretty crude by comparison --- this is really just a toy --- it might be somewhat useful though, for pretty undemanding applications.

The goal in adding new instructions to the 65VM02 is that instructions that would otherwise be done sequentially can be done in parallel.

LDYA and STYA are macros because the two instructions can't be parallelized, so no time is saved. I have several new instructions that do an exchange though, because that can be parallelized and should be much faster than moving data around sequentially.

BigDumbDinosaur wrote:

Quote:

If there is only one interrupt source, then the ISR can ignore A and not bother with the JVM. As for saving A to the return-stack, which is done in VIRQ but not in IRQ, almost every ISR is going to do this manually --- Garth mentioned a few cases in which the ISR didn't need to use A --- I would expect most ISRS would use A though.

True, but in "traditional" ISR programming it is the programmer's discretion as to how processing might proceed. As soon as you wire-in assumptions into the MPU itself you take that discretion away from the programmer.

Incidentally, I have written ISRs in which only .X and .Y were used and the accumulator never touched.

I would expect most 65c02 programmers to consider having A pushed automatically to be a good thing, or at least not a bad thing. You remind of how the ANS-Forth enthusiasts criticize me for writing a code-library in ANS-Forth, saying that I'm taking the programmer's discretion away from him, and that all application programs have to be written from scratch in order to be super-optimized.

There are several possible designs that are reasonable. For example, we could just have 8 interrupts, each with its own vector in upper-memory. They can be prioritized, so if two or more are ready at the same time, the high-priority IRQ goes first. If you do this, then you need 8 pins though. By comparison, with my design you need 4 pins (the VIRQ line and the 3 pins where the indicator value gets input). You could do it with 3 pins (the 3 pins are the indicator value, but 0 indicates that there is no interrupt, so you have 7 possible interrupt values). The concept of "pins" doesn't really make any sense though, because the peripherals are actually built-in to the FPGA anyway --- this is not like the old days when you had a 6522 chip sitting on the board next to the 6502 chip.

I don't really know enough about chip design to have a strong opinion on the subject of how interrupts should work. I'm just a programmer --- I have strong opinions on how software should work (high-level languages are easier than assembly-language, which is why the primary feature of the 65VM02 is supporting a byte-code VM) --- the 65VM02 would have been useful in the early 1980s when people wanted to program their Apple-II in Pascal or Forth or BASIC or whatever, but doing so was too slow, so they programmed in assembly-language instead.

GARTHWILSON · Post by **GARTHWILSON** » Sun May 14, 2017 3:45 am

Speaking of INC and DEC (& friends), I've always thought it would be cool to have an automatic, implied compare-to-$FF instruction integrated, and have a flag you can branch on. If you're counting down, you can see when it becomes negative and then branch on the N flag, but that only works if you start with $7F or less.

I've also wished for an STF (STore $FF) instruction, for setting flag variables.

Uh-oh, we're on that slippery slope again!

barrym95838 · Post by **barrym95838** » Sun May 14, 2017 3:54 am

GARTHWILSON wrote:

... I've always thought it would be cool to have an automatic, implied compare-to-$FF instruction integrated, and have a flag you can branch on ...

You're in good company. Woz thought that the idea was important enough to include BM1 and BNM1 in Sweet 16.

Mike B.

GARTHWILSON · Post by **GARTHWILSON** » Sun May 14, 2017 3:55 am

Hugh Aguilar wrote:

Quote:

Incidentally, I have written ISRs in which only .X and .Y were used and the accumulator never touched.

I would expect most 65c02 programmers to consider having A pushed automatically to be a good thing, or at least not a bad thing.

If it only takes one cycle, that would be an asset, since it usually does have to be pushed and pulled. I suppose the reason it wasn't done on the '02 is that exceeding the 7-cycle limit would have presented quite an expense in silicon real estate.

Quote:

You remind of how the ANS-Forth enthusiasts criticize me for writing a code-library in ANS-Forth, saying that I'm taking the programmer's discretion away from him, and that all application programs have to be written from scratch in order to be super-optimized.

Ahem—that's like saying macros take away control, as if you couldn't still do it however you want if you don't want to use the macro. Or did they forget they can delete those words, or rename them, or modify them??

Hugh Aguilar · Post by **Hugh Aguilar** » Sun May 14, 2017 4:42 am

GARTHWILSON wrote:

Speaking of INC and DEC (& friends), I've always thought it would be cool to have an automatic, implied compare-to-$FF instruction integrated, and have a flag you can branch on. If you're counting down, you can see when it becomes negative and then branch on the N flag, but that only works if you start with $7F or less.

An implied compare-to-$FF instruction so you can have a flag??? That is the carry flag --- when ADC or SBC roll over, the carry reflects this, and this would also be true for subtracting 1 or adding 1 as done in INC and DEC --- that is why I said that INC and DEC should affect the carry flag like ADC and SBC --- I don't claim to be an expert on the 65c02 though, so maybe I need to:
"dig a little deeper into fully understanding the 65C02 architecture before cavalierly dismissing something as a design problem"

The only reason you wouldn't want INC and DEC to affect the carry flag, is that you already have something in the carry flag and you don't want the carry flag to get clobbered by the INC and DEC. I can't think of any case in which this would be true however. That is why I say that it is a design flaw in the 65C02 that INC and DEC don't affect the carry flag

GARTHWILSON wrote:

I've also wished for an STF (STore $FF) instruction, for setting flag variables.

Uh-oh, we're on that slipper slope again!

Well, there are a lot of new instructions that we could wish for. I don't want to get over-enthusiastic with the 65VM02 --- I took away some of my new instructions in this most recent design!

My rule with the 65VM02 is that I don't invent a new instruction if it is only replacing two old instructions. It has to replace at least three old instructions. Your function to store $FF only replaces STZ and DEC which is two. Also, some people (C programmers) use 1 for a true flag rather than -1, so your instruction wouldn't be used anyway --- I have always considered it to be a design flaw in Forth that -1 is used as a true flag --- I like 1 as true.

It was good that the 65c02 got STZ though. It replaced two instructions, which were the load of a register with 0 and the store of that register --- the advantage is that a register wasn't required, so if X Y and A are all in use, you are still good.

Hugh Aguilar · Post by **Hugh Aguilar** » Sun May 14, 2017 4:56 am

BTW, I noticed Garth's tagline:
_________________
http://WilsonMinesCo.com/ lots of 6502 resources

I glanced over that website and saw that he has an interest in the HP41 calculator. This is off-topic, but in Denver there is the "6502 Club" --- they originally worked with the 6502, which is where they got their name, but they have since upgraded to the PIC18. They have the code for the HP41 and they wrote an emulator in PIC18 assembly-language for it, and they built an HP41 calculator!

Arlet · Post by **Arlet** » Sun May 14, 2017 5:17 am

BigDumbDinosaur wrote:

Nothing that occurs in an MPU (or any kind of silicon logic) is instantaneous. The internal gates in the device, regardless of how small the geometry to which the device was fabricated, have finite propagation time, which sets a very hard limit on how fast things can occur. In this case, prior to loading register "A" (an accumulator?) with an index, you have to save that register's content somewhere, e.g., on a stack, and that too takes time. These two steps have to occur sequentially, not simultaneously, in step with the clock. So, no, it will not be instantaneous.

Saving 'A' would take 1 extra cycle, but manually doing a PHA takes 3, and in most cases you'd want to do that anyway.

Loading 'A' with an index takes no time, because it can happen in parallel with jumping to the vector.

Another option would be to add a few more vectors, which would cost no extra time at all. Or automatically switch to a shadow register bank on interrupt, which would also be instantaneous (but can make nested interrupts trickier). You could even put the PC and PSW in the shadow registers, and reduce interrupt latency by 3 cycles.

GARTHWILSON · Post by **GARTHWILSON** » Sun May 14, 2017 5:25 am

What I mean by the implied, automatic compare-to-$FF instruction and its flag is that only $FF would qualify, not $FE, or any other result. The C flag cannot be depended on for this.

Forth words that input a flag cell generally consider any non-0 number to be true. It facilitates a lot of things, and in the innards, the 65's can just branch on the Z flag. Early Forths used a 1 as a true flag (I think this was true of Forth 79), but they saw the value in changing it.

Something I've done in assembly language to set a flag variable is to just decrement it. If later tests will be with the BIT instruction which does not need or affect A, X, or Y, the high bit gets transfered to the N bit for conditional branching. The decrement then assumes it started out as 0 or at least not below $81, meaning you have to know you won't be decrementing it so many times that it drops to $7F (or $7FFF on the '816 with 16-bit A operating on 2-byte cells, which makes for a lot more decrementing before getting into trouble). Every second decrement will flip the 1 bit.

Quote:

they have since upgraded to the PIC18

I would say that in many senses, that's just an update, not an upgrade. It does have nice I/O and processor-support features on board, and it has a lot of kludges to address deficiencies in the PIC16's, but it still has problems. At least it's more suitable for HLL compilers.

65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02

Re: 65VM02