6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 01, 2024 12:08 am

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Fri Jun 17, 2016 10:44 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
I've been using the Visual 6502 to validate the behaviour of various undocumented opcodes for my TTL 6502 project. As Dr. Jefyl predicted (http://forum.6502.org/viewtopic.php?f=4&t=3493&p=44641), implementing these slippery devils without an authoritative guide is indeed proving ambitious. That said, the Visual 6502 has been very helpful in this regard. For example, LAX zp (opcode $A7) is supposed to load both the A and X registers from memory. A quick test on the Visual 6502 confirms this - http://www.visual6502.org/JSSim/expert.html?graphics=f&steps=17&a=0000&d=a9fe8520a72000. In the final step we can see $FE is loaded both into the A and X registers exactly as it should be. I have run all undocumented opcodes I am aware of through the Visual 6502 in this way, and most work as expected, but not all ...

For example, opcode $4B did not perform as advertised. $4B is supposed to take an immediate argument, perform an AND with the accumulator and then shift the result right. The opcode is referred to as either ALR or ASR in various docs (http://nesdev.com/undocumented_opcodes.txt or http://www.oxyron.de/html/opcodes02.html). A test on my VIC 20 confirms this behaviour: AND followed by LSR. But the Visual 6502 shows something different. Here, the argument seems to be read but ignored. See:http://www.visual6502.org/JSSim/expert.html?graphics=f&steps=11&a=0000&d=a9804b000000. LDA #$80, ALR #$00 should leave a 0 in the accumulator not $40. I ran the same test on my VIC 20 and sure enough, I get a zero for this sequence.

I also encountered problems with LAS ($BB) and ARR ($6B) on the Visual 6502. For ARR (A:=(A&#{imm})/2), once again the argument seems to be ignored (as with ALR above) so the result is equivalent to a ROR. LAS (A,X,S:={adr}&S) is interesting in that the right result is generated and transferred to A,X and SP as expected, but then in the next half cycle X and SP are restored and the high nibble of A is incremented. Puzzling - I did not test this thoroughly so I can't be definitive about what is happening there.

Interestingly, $AB seems to exhibit a similar ambiguity as ALR around an AND #imm operation. It is documented as ATX (A,X := A AND #imm) on http://www.nesdev.com but as LAX (A,X := #imm) on http://www.oxyron.de and labelled as "highly unstable". In this case, the Visual6502 respects the AND #imm operation yielding the ATX variant - http://www.visual6502.org/JSSim/expert.html?graphics=f&steps=11&a=0000&d=a980Ab000000. Perhaps the "highly unstable" characterization has something to do with the AND #imm.

Now I know all bets are off with these undocumented opcodes, but this has me puzzled (and interested). I'm wondering if anyone has run into these quirks in the Visual 6502 before and can comment. Any insights would be much appreciated.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 18, 2016 3:54 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10971
Location: England
The undocumented opcodes are of interest to only a subset of 6502 users, but they do illuminate the inner workings of the CPU, and in this case may illuminate the inner workings of visual6502, so I think this is interesting to look into.

We don't expect the visual6502 necessarily to be accurate in these cases, but it is interesting if it isn't and is worth understanding it.

We should note that the opcodes regarded as "unstable" may be especially slippery - if they behave differently on different CPU chips then we can at best see visual6502 agree with only a subset of CPUs.

I could do experiments on my BBC micro, and also in visual6502. If we look at this different experiment:
http://www.visual6502.org/JSSim/expert. ... loglevel=5
then we do see the value 05 on the special bus in cycle 4a, also as the two ALU inputs, and yet the value 62 lands in A.

You may be aware of perfect6502 - it's a reimplementation of the visual6502 algorithm, so it runs faster and for some purposes may be easier to work with. It's on github. The author, Michael Steil, did investigate all 256 opcodes, with results tabulated at
http://visual6502.org/wiki/index.php?ti ... 56_Opcodes

Perfect6502 could also be useful to validate that any tweaks to the algorithm haven't broken anything else. (It would be good to run Klaus' test suite in this case, but even better if we had a way to run Wolfgang's exhaustive test suite.)

Also of interest will be Michael's explanation of the reasons behind the undocumented opcodes and their behaviour: http://www.pagetable.com/?p=39

I'm on the road at present, so will have another look when I get back.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 18, 2016 4:25 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10971
Location: England
Oh, and also have a look at this discussion of opcode $8B, which gives a hint of how complicated things can be.
http://visual6502.org/wiki/index.php?ti ... AA,_ANE%29


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 18, 2016 10:41 am 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Thanks for these references, BigEd. The discussion on $8B is particularly revealing and a vivid reminder of the complexities involved. It makes sense that analogue phenomena are at the root of instability with specific opcodes, and in that case, that the power supply, temperature and other factors might determine the behaviour. Thankfully, the extensive tests in this instance show some consistency for values to use in emulating specific machines ... In my case, I'm interested in both the VIC20 and the C64 so I will need to test these opcodes on those two platforms to confirm the behaviour. It will be interesting to see how much variation I discover between those two.

Regarding the Visual 6502, one can assume that the behaviour we see above comes from having to fix analogue values for the emulation. That might trigger certain AND #imm operations to be equivalent to an AND $FF as we see with $8B, and to present the appearance that the argument is ignored when in fact something more complex is going on. Certainly, the fact that partial results show up in specific busses is an indication that something more complex is at work. Very interesting.

I'll report back once I get a chance to carry out further experiments.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 18, 2016 11:15 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
I ran some quick tests on the VIC20 for $AB, $4B and $6B with some interesting results. (I say quick, meaning I used only a handful of input values in each test).

For $AB, recall it's either LAX (A,X <- #Imm) or ATX (A,X <- A & #Imm). My tests show an LAX on the VIC20 and an ATX on the Visual 6502. If we assume that the same kind of "analogue" behavior is at work on this opcode as we find in $8B, then perhaps we may more succinctly express it's function as:

$AB: A <- (A | CONST) & #Imm, where CONST = $00 on the Visual 6502 and $FF on the VIC20

with the effect that the VIC20 ignores the AND A operation. Of course, there is no evidence that anything like this is actually going on internally, but I find it a useful notation nevertheless. We can do the same for the other two opcodes in question as follows:

$4B ALR: A <- LSR (A & (CONST | #Imm)), where CONST = $FF for the Visual 6502 and $00 for the VIC20
$6B ARR: A <- ROR(A & (CONST | #Imm)), where CONST = $FF for the Visual 6502 and $00 for the VIC20

For these two, it's the Visual 6502 that ignores the AND operations.

The descriptions of $8B indicates that different values for CONST are possible on different systems (values other than $00 and $FF), and even on the same system under different conditions. It's at least plausible that there is some common behavior here since these are all $xB opcodes and presumably shared PLA lines are firing. It would be interesting to see how these opcodes behave on a variety of systems.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 27, 2016 6:30 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10971
Location: England
I've been looking through old emails, and I have permission to share a few quotes from Barry Silverman, one of the visual6502 authors. These might help illuminate to what degree we can expect the visual6502 to be a faithful model of the 6502 behaviour. Bear in mind that the visual6502 is a digital simulation - a two-state simulation. (Some digital chip simulators offer four states in eight strengths, whereas circuit simulators model a continuous voltage variation.)

Quote:
Our experience thus far on two significant projects has been that the simple-minded digital solution has been adequate.

[But recent discussions indicate] that we don't actually know why the simplification works, or why it doesn't fail for certain issues that we know occur in the analog world (e.g. the switching of VCC to VSS in the node below the S register).

It is relatively easy to instrument either the Javascript version (if you want visual feedback), or the 'C' version and log the intermediate states during signal propagation.

We can follow to a fine degree of granularity how the existing simulation algorithm processes the signals, and we can (if necessary), do experiments on changing the ordering. The breadth-first algorithm approximates the parallelism of the transistor switching to a first- degree approximation.


Quote:
[Segher] pointed out a case with a SINGLE register loop that is determined by a race:
IE - if # and X are both 1 and A is 0, then the ultimate result is determined by whether the output side of A is evaluated by chipsim first, or the input side.

I guess our breadth first evaluation is totally consistent (e.g. the output side always gets evaluated first), as I otherwise would have expected different bits of A to have different race winners.

This instruction is doing something in hardware that is considered a design error that leaves an indeterminate result. For this 8B Instruction - even repeated execution on the same machine results in different values.

I will try to describe the scenario in software terms. Imagine a concurrent programming model with the following specs:

Two shared variables:
SB with initial value 1
A with initial value 0

There are two asynchronous threads that are started at the same time:
1) A<-SB
2) SB<-SB & A

The system will settle to a stable result immediately, but depends totally on which thread runs first:
If (1) runs first- then the system will settle to A:1 and SB:1
If (2) runs first - then the system will settle to A:0 and SB:0

Asking to determine the final value of A is impossible unless you had an insight into the processes that governed the scheduling of the threads.This model is equally incorrect in hardware as in software.

The nature of this "system" implemented on a 6502 is that the order of execution of the threads is related to all sorts of physical phenomena that are inherently unpredictable in general, but may be very stable on a particular instance of hardware.

On some pieces of hardware, (1) may run faster, and have different relative speeds depending on which A bit it is. (The X'EE' may be accounted for because the Decimal Adjust hardware is not the same for each bit - and so may impact the physical phenomena that determine the relative speeds of the two processes.).

In JSSim's algorithm (2) will ALWAYS run first. Thus JSSim will always return A:0 for this circuit. (JSSim first calculates the result of wiring together all transistor output - process (2), and then applies those outputs to transistor inputs - which may then switch (process (1)) .


I'm not certain what situation(s) Barry's analysis applies to, I hope it helps understanding the visual6502 ("chipsim" aka "JSSim") model.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 01, 2016 12:59 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Thanks for digging up these emails BigEd! it's fascinating get a sense of some of the issues involved in validating the Visual 6502.

The explanation above makes it very clear that race conditions exist in certain undocumented operations, and it stands to reason that the behavior of the circuit is then determined by physical factors which are not part of the model - and indeed, factors which may change across executions on the same system!

It's interesting to note that the Visual 6502 uses a consistent strategy (output side first). That makes sense, and as you suggested will lead to a deterministic outcome which at best will match an subset of systems out there. The only reasonable objective for emulators then is to aim for a given set of systems, rather than with the 6502 in general. In my case, I have run tests on my VIC20 and thankfully have seen consistent results across executions. I finally did get a chance to test LAS ($BB) as well and it seems to behave "correctly" and consistently. Dieter reports the same results on his C64. (That is A,X,SP <- MEM(adr),Y & SP).

Of course, that does do not rule out that some other C64 or VIC20 on another day may produce different results, but so be it. Seems a good bet to go with the "documented" behavior as confirmed by a few tests. I must say, it does feel rather unusual to have to make a bet, whatever the odds may be, on the outcome of a piece of code :)

Thanks again for looking into this.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 01, 2016 1:14 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10971
Location: England
There's another email or two where Michael Steil explains that in a C64, with the VIC sharing the bus, some results will happen deterministically but very rarely, depending on the relative timing of the instruction and the VIC bus activity. So beware concluding that you see a consistent result, unless you've measured at least thousands of times!

Quote:
The CPU and the "VIC" video controller in the C64 share the same RAM. RAM accesses from the two devices are interleaved, but every 8 rasterlines (i.e. at the beginning of a character line, 25 times per screen), this system is not enough for the VIC, because it needs to fetch 40 more bytes for the character indexes. So the VIC stalls the CPU for 40 cycles by pulling on the "RDY" pin.

Rasterlines where this happens are called badlines.

Oh, btw, the VIC also steals cycles when there are sprites on the screen. 4 cycles on the sprite's first rasterline, 3 cycles on every subsequent one. We should test whether this has an influence.


Quote:
I tested this 8500

MOS
8500R3
5185

in my old-PCB C64 (old PCB and new 6510 are a rare combination), and I got this with the screen turned on:

A = MAGIC & X & imm, with MAGIC being 0xEE, 0xEF, 0xFE or 0xFF, depending on A and chance.

Code:
  A & 0x10  |  A & 0x01  |   0xEE   |   0xEF   |   0xFE   |   0xFF
------------+------------+----------+----------+----------+----------
      0     |      0     |  0.0732% |    0     | 98.3642% |  1.5628%
      0     |      1     |  0.0732% |  0.0152% |    0     | 99.9116%
      1     |      0     |  0.0732% |    0     | 98.3642% |  1.5628%
      1     |      1     |  0.0732% |    0     |    0     | 99.9268%

In words: If A has bit 0 set, it's mostly FF, otherwise it's mostly FE. EE happens very rarely, independent of A, and EF happens only on bit 4 clear && bit 0 set, but also very rarely.

Peddle indeed throws dice on this chip.

With the screen turned off, i.e. the VIC makes no memory accesses and the 6502 is the only one ever accessing memory, I get different results. I'll send them in a different email.

Method:
For every possible A, I executed 8B with X=0xFF and imm=0xFF 65536 times. The "0.0732%" in the table above means I got between 5 and 15 EEs in the 65536 runs, averaging at 10, so it's 10/65536 = 0.0732%.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 01, 2016 2:09 pm 
Offline
User avatar

Joined: Tue Jun 07, 2016 4:34 pm
Posts: 53
Wow, interesting stuff! There is a nice paper at csdb (http://csdb.dk/release/?id=143981) about the 6510 unintended opcodes, written by one of the VICE guys.
It explains what the instruction does, describes the different corner cases, and references test cases available in the VICE's repository and visual6502 simulation links.

By the way, there is a thing I don't get in the last mentioned email:
in the rows of the table where the lowest bit of A is 1, how can he conclude that MAGIC is 0xFF and not 0xFE?
The resulting value of A is (A | magic), as both X and imm are 0xFF, right? So the 1 in the LSB can come from A and not necessarily from MAGIC.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 01, 2016 10:03 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Thanks for the link drfiemost. Very interesting reading!

According to the pdf, there are 7 unstable opcodes in two groups:

1) The (& H + 1) group - SHX ($9E), SHY ($9C), TAS ($9B), AHX ($9F, $93). Two problems are noted as occurring "sometimes". The first is that the (& H + 1) portion of the operation seems to be dropped, and the other that a page crossing will corrupt the target address such that the high-byte is equal to the value being stored.

2) The (A | MAGIC) group - XAA/ANE ($8B), LAX/LXA/ATX($AB), where MAGIC can have a variety of values as described in the above posts.

Interestingly, missing from the list of "unstables" are ALR($4B) and LAS($BB) which for me showed different results on the Visual 6502 and my VIC20. More food for thought.

Just for kicks, I ran some "stability" tests on my VIC20. As in the tests above, I ran $8B for all values of A with X and #imm set to $FF, repeating 65536 times. The value for MAGIC was $FF for every iteration. I then ran a similar test for $AB and found MAGIC was $EE every time (my earlier test of $AB was too casual and proved flawed). Despite these particular "consistencies", though, it's sure tricky business! The VICE pdf offers the advice below for $8B and $AB, which is probably as definitive as one can get:

Quote:
Do not use [...] with any other immediate value than 0, or when the accumulator value
is $ff (both takes the magic constant out of the equation)! (Or, more generalized, these are safe if
all bits that could be 0 in A are 0 in the immediate value.)

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 03, 2016 7:32 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10971
Location: England
(The paper "No More Secrets" by Groepaz is very good indeed - thanks drfiemost for linking it!)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: