6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 5:59 pm

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Mon Jan 20, 2020 1:32 pm 
Offline

Joined: Mon Jan 20, 2020 11:22 am
Posts: 9
Location: Wrocław, Poland
Hi.

I've stumbled recently on curious phenomenon. I'm performing some experiments on Atari Lynx handheld console equipped with 65C02. What I'm trying to accomplish it to synchronize to it's display so that any color register changes will be seen on stable positions on the screen (classic beam-tracing). Sadly the console does not have any means to do it in a simple way. Register polling is inherently flawed as the change in register can occur in any shift relatively to the CPU instructions, and VBI can be delayed from 0 to 6 cycles depending which instruction is actually executed. The CPU is not WDC so it does not have WAI, so I thought that it might be clever to use at the right moment (when the interrupt will about to fire) the sequence of one-byte one-cycle NOP instruction X3 or XB to have deterministic interrupt latency. I did what I thought and... surprise...

The stream of one-cycle NOPs is uninterruptible! It behaves just as if it was a one big multibyte instruction!

Did anyone spotted this behavior before? Do anyone have any idea why it is so? Is one cycle too little for interrupt machinery to kick in?


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 20, 2020 2:03 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
(Thanks for the new thread! It's an interesting finding, and a question I don't think I can answer, but hopefully someone else can.)

There's an idea for using such NOPs in this thread, which asks the question about interruptability, but doesn't answer it:
Ultra-fast output port using 65C02 illegal instructions

Edit: worth pulling in the snippet from Jeff's earlier post:

Quote:
I've now located the reference that eluded my memory earlier. It is WDC's own data sheet for the W65C02S, and the second row of Table 7-1 describes Execution of invalid OpCodes.

listed as 2 byte, 2 cycle are 02,22,42,62,82,C2, E2
listed as 1 byte, 1 cycle are X3,OB-BB,EB,FB
listed as 2 byte, 3 cycle is 44
listed as 2 byte, 4 cycle are 54,D4,F4
listed as 3 byte, 8 cycle is 5C
listed as 3 byte, 4 cycle are DC,FC

Folks who find such minutia interesting may wish to visit this page of my site. I've documented not just the bytes and cycles, but also what the undefined NOP's actually do (on the Rockwell 'C02). I can't vouch for what they do on the W65C02S. But, whatever it is, it takes the same number of bytes and cycles as the Rockwell part.

Something I'd like to know is whether the one byte, one cycle NOP's delay interrupt acceptance by one cycle -- which would be a drawback in some situations. If I do any further testing I'll be sure to report back.


(Also corrected the link to Jeff's site.)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 20, 2020 2:15 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
(Just to note, the 65C02 in Atari's Lynx is a custom version in a VTI chip. So it's not clear whether it would be expected to behave like a Rockwell C02, a WDC C02, or something else.)

But I think the hypothesis that a one-cycle instruction doesn't have the appropriate states to process an interrupt is a good one: in the original 6502 the interrupt detection is in a specific cycle, with a special case for branches.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 20, 2020 3:22 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Quote:
The stream of one-cycle NOPs is uninterruptible! It behaves just as if it was a one big multibyte instruction!

Did anyone spotted this behavior before?
Yes, and I should update my post which you quoted. (Oh, and I see Ed, too, searched and found it -- thanks, Ed!)

Quote:
Do anyone have any idea why it is so? Is one cycle too little for interrupt machinery to kick in?
I have a theory that to me seems satisfactory. It's maybe best to start by looking at another question. Back when the CMOS '02 (ie, 65C02) was being developed, one of the goals was to render all undefined opcodes into NOP's. Why that was a goal is a question in itself.

It's certainly true that, for NMOS '02, users had figured out what the undefined aka illegal opcodes did and then, despite the weirdness of those ops' behavior, proceeded to use them in programs. Evidently WDC (developer of the new, CMOS CPU) perceived this as undesirable, probably because there was a threat that the wonky, NMOS undefined op's could become a de facto standard -- and such a standard would prevent the undefined opcodes from being more gainfully deployed in future as new, approved (and non-wonky!) instructions. The 'C02 did introduce some new instructions, but not enough to fill the entire opcode map. And somebody in management apparently decided the remaining undefined opcodes must be turned into NOP's in order that they would have zero appeal for hacking. (They failed to completely eliminate the appeal. I and others hacked the new, so-called NOP's anyway. But that's another story! :P )

With management's decision made, it then became someone's job to alter the new CPU's logic in a way that rendered all undefined opcodes as NOP's. And that someone said, "I'm lazy. What's the easiest way to approach this?" (Of course I'm theorizing now.)

One easy thing is to detect opcodes in the $_3 and $_B columns of the map. Late in the cycle during which an opcode is fetched (ie, SYNC is high), the value fetched is examined and a determination is made during the few remaining nanoseconds. If the opcode fetched is from column $_3 or $_B then the added logic evidently says, "OK, what we have to do in the next cycle is fetch a new opcode!"... even though the present cycle has already fetched a new opcode.

To be clear, this is different than other instructions. Even the official NOP ($EA) needs to enter the pipeline and get executed, and the pipeline requires two cycles minimum. But opcodes in the $_3 and $_B columns seemingly do not enter the pipeline. They just generate a hiccup which causes the chip to forget that an opcode has already been fetched. It's a clever solution, given that it requires minimal logic and yet it fixes all 32 of the undefined opcodes in columns $_3 and $_B. (The first C02's didn't have STP and WAI.) I'm guessing that the remaining undefined opcodes (those not in columns $_3 and $_B) were quite a lot more trouble to neuter. But those in columns $_3 and $_B were low-hanging fruit.

One side effect of the induced hiccup is that an interrupt sequence can't commence. This means, as you say, laoo, that the $_3/$_B instruction(s) and the normal instruction which follows become "atomic" -- a single, uninterruptable entity. I confirmed this by experiment, as did you.

For me it's good news, because I have a history of using hardware to hack undefined opcodes, and one very powerful tactic is to use a $_3/$_B opcode as a prefix to alter the following instruction. But things get complicated if an interrupt causes the prefix to become separated from the op it's intended to affect! I'm pleased that that threat is eliminated by the induced hiccup I just described.

For your purposes, laoo, perhaps this very simple Clock Stretching circuit would offer a solution.

-- Jeff

eta: my bad, the first 'C02 cpus had 64 single-cycle NOP's, not 32. Opcodes in columns $_3, $_7, $_B and $_F all were single-cycle NOP's. I tend to overlook this as my first 'C02 was a Rockwell device which, like later WDC C02's, has bit-oriented instructions in columns $_7 and $_F.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 21, 2020 1:42 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
It should actually be possible to implement WAI (with the WDC semantics) externally to the CPU, by intercepting the $CB opcode on a SYNC cycle, and pulling RDY low unless either /IRQ or /NMI are low. RDY is valid at the end of the Phi2 phase, when the opcode is on the data bus. In fact this doesn't even take the 3-cycle setup penalty that the real WAI instruction does, but by the sound of it, if the I bit is not set, the interrupt won't actually be taken until the first normal instruction following the $CB opcode has completed.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 21, 2020 8:54 am 
Offline

Joined: Mon Jan 20, 2020 11:22 am
Posts: 9
Location: Wrocław, Poland
Thank you for the elaborate explanation embellished with such colorful hypotheses.
The problem I was trying to solve is on Atari Lynx where the CPU is embedded in a VTI chip so no hardware modification is possible. I'll try to solve it by some fancy modular arithmetics to find the right spot with regards to exact moment of counter's change.
Nevertheless it's good to mention this behavior as I couldn't find any information about it.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: