6502.org

Posted: **Tue Dec 29, 2020 2:04 pm**

BigEd wrote:

(One of the things which stands out when you look at the 65816 die shot is that the datapath is really long and the 'PLA' really wide. For the datapath to be so long, without empty space, there could be a few extra bits in there which make a big difference - such as, more than one incrementer... and indeed, it turns out there are details in the now-expired patents, see the thread.)

Thanks for the link BigEd. That patent application is fascinating, especially its focus on chip layout, capacitance and the associated impact on min-terms -- a real window into design tradeoffs. Very interesting, and very much on point.

To wit, it's possible in this design to improve the efficiency of the 8-bit adder further by using a carry-select mux for the upper nibble. This would provide a comfortable margin for a carry-select incrementer in the ALU to manage 16-bit addresses in a single cycle. However, it comes at the cost of extra ICs and a more involved layout. The trade off may be well worth it, but we will need to see how feasible the layout turns out to be in the end. (Kicking that layout can down the road yet again!).

Getting back to cycle accuracy, I did confirm that PC wraps on a bank boundary as does MVP/MVN. That said, I'm seeing some inconsistencies regarding cycle-counts in other areas that are puzzling. I'll try to post about it shortly.

Posted: **Tue Dec 29, 2020 8:17 pm**

As mentioned in the previous post, I can't quite make sense of some 65816 cycle counts. An example is the Direct Indirect Long Indexed ("[dir],Y") addressing mode.

Various descriptions suggest that the bank byte in this addressing mode is adjusted if the index causes a bank boundary to be crossed. (see WDC 65816 Programming Manual p 305, and Bruce Clarke's 65816 Opcodes tutorial Section 5.13). However, there appear to be too few cycles available in the reported count to do this. The cycle-by-cycle execution is shown on the WDC 65816 Datasheet, p. 39, Table 5-7, row 14 as follows:

Code: Select all

Cycle   Address Bus     Data Bus
-----   -----------     --------
1.      PBR, PC         Opcode
2.      PBR, PC+1       DO
2a.     PBR, PC+1       Internal Operation
3.      D+DO            AAL
4.      D+DO+1          AAH
5.      D+DO+2          AAB
6.      AAB,AA+Y        Data Low
6a.     AAB,AA+Y+1      Data High

(where cycle 2a is executed if DL != 0 and cycle 6a if bit m = 0). The difficulty is that there is no time for the CPU to adjust the bank byte between cycle 5 when it is read, and cycle 6 when it is used as part of the effective address. Interestingly, cycle 6 and 6a are shown here to use "AAB,AA+Y", which suggests the index might be applied to the 16-bit base address only, before being appended to the unadjusted bank byte to form a 24-bit effective address.

That somehow seems unlikely, but otherwise an extra cycle must be lurking somewhere (or perhaps the 65816 is adjusting the high-byte on the fly?). I wonder if someone more familiar with the 65816 can comment. Does [dir],Y wrap on a bank boundary? If not, does a bank-crossing trigger an extra cycle?

Thanks in advance for any thoughts,
Drass

Posted: **Wed Dec 30, 2020 11:39 am**

Drass wrote:

Does [dir],Y wrap on a bank boundary? If not, does a bank-crossing trigger an extra cycle?

Rather surprisingly, seems the answers are no and no.

I did a small test on Beeb816, captured a trace using a cheap logic analyzer, and fed this through my 65816 Bus Cycle Decoder.

Here is the test program:

: capture9.png (2.9 KiB) Viewed 1617 times

Apologies for not using a proper 65816 assembler:
- 8F cc bb aa is STA aabbcc
- B8 aa is LDA [aa], Y

Here's it running:

: capture8.png (3.39 KiB) Viewed 1617 times

And here's the cycle by cycle output of the 65816 Bus Cycle Decoder on the captured logic analyzer trace:

Code: Select all

0 a9 O 1 1
1 55 P 1 1
002000 : A9 55       : LDA #55        : 2 : A=??55 X=0000 Y=0000 SP=01FD N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 f8 P 1 1
4 55 D 0 1
002002 : 8F 00 00 F8 : STA F80000     : 5 : A=??55 X=0000 Y=0000 SP=01FD N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a9 O 1 1
1 aa P 1 1
002006 : A9 AA       : LDA #AA        : 2 : A=??AA X=0000 Y=0000 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 f9 P 1 1
4 aa D 0 1
002008 : 8F 00 00 F9 : STA F90000     : 5 : A=??AA X=0000 Y=0000 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a0 O 1 1
1 80 P 1 1
00200C : A0 80       : LDY #80        : 2 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 84 O 1 1
1 80 P 1 1
2 80 D 0 1
00200E : 84 80       : STY 80         : 3 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a9 O 1 1
1 ff P 1 1
002010 : A9 FF       : LDA #FF        : 2 : A=??FF X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 85 O 1 1
1 81 P 1 1
2 ff D 0 1
002012 : 85 81       : STA 81         : 3 : A=??FF X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a9 O 1 1
1 f8 P 1 1
002014 : A9 F8       : LDA #F8        : 2 : A=??F8 X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 85 O 1 1
1 82 P 1 1
2 f8 D 0 1
002016 : 85 82       : STA 82         : 3 : A=??F8 X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 b7 O 1 1
1 80 P 1 1
2 80 D 1 1
3 ff D 1 1
4 f8 D 1 1
5 aa D 1 1
002018 : B7 80       : LDA [80],Y     : 6 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 85 O 1 1
1 83 P 1 1
2 aa D 0 1
00201A : 85 83       : STA 83         : 3 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 60 O 1 1
1 00 I 1 1
2 00 I 1 1
3 16 D 1 1
4 8f D 1 1
5 00 I 1 1
00201C : 60          : RTS            : 6 : A=??AA X=0000 Y=0080 SP=01FF N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000

LDA [80],Y really does only take 6 cycles, even when crossing bank boundaries.

The 65815 must be handling the increment of the bank address "on the fly" in the data setup time window at the end of cycle 5.

Dave

Posted: **Wed Dec 30, 2020 12:03 pm**

Here's the same test, in BBC Basic 816, which has a full 65816 Assembler built in and runs in native mode:

: capture11.png (2.83 KiB) Viewed 1614 times

: capture12.png (3.35 KiB) Viewed 1614 times

Same result:

Code: Select all

0 a9 O 1 1
1 aa P 1 1
002000 : A9 AA       : LDA #AA        : 2 : A=00AA X=4ECB Y=681D SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 fd P 1 1
4 aa D 0 1
002002 : 8F 00 00 FD : STA FD0000     : 5 : A=00AA X=4ECB Y=681D SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a9 O 1 1
1 55 P 1 1
002006 : A9 55       : LDA #55        : 2 : A=0055 X=4ECB Y=681D SP=01C9 N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 fc P 1 1
4 55 D 0 1
002008 : 8F 00 00 FC : STA FC0000     : 5 : A=0055 X=4ECB Y=681D SP=01C9 N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a0 O 1 1
1 80 P 1 1
00200C : A0 80       : LDY #80        : 2 : A=0055 X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 84 O 1 1
1 80 P 1 1
2 80 D 0 1
00200E : 84 80       : STY 80         : 3 : A=0055 X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a9 O 1 1
1 ff P 1 1
002010 : A9 FF       : LDA #FF        : 2 : A=00FF X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 85 O 1 1
1 81 P 1 1
2 ff D 0 1
002012 : 85 81       : STA 81         : 3 : A=00FF X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a9 O 1 1
1 fc P 1 1
002014 : A9 FC       : LDA #FC        : 2 : A=00FC X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 85 O 1 1
1 82 P 1 1
2 fc D 0 1
002016 : 85 82       : STA 82         : 3 : A=00FC X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 b7 O 1 1
1 80 P 1 1
2 80 D 1 1
3 ff D 1 1
4 fc D 1 1
5 aa D 1 1
002018 : B7 80       : LDA [80],Y     : 6 : A=00AA X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 85 O 1 1
1 83 P 1 1
2 aa D 0 1
00201A : 85 83       : STA 83         : 3 : A=00AA X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 6b O 1 1
1 00 I 1 1
2 00 I 1 1
3 ba D 1 1
4 30 D 1 1
5 f8 D 1 1
00201C : 6B          : RTL            : 6 : A=00AA X=4ECB Y=0080 SP=01CC N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=F8 DB=00 DP=1900

Dave

Posted: **Wed Dec 30, 2020 2:47 pm**

hoglet wrote:

Drass wrote:

Does [dir],Y wrap on a bank boundary? If not, does a bank-crossing trigger an extra cycle?

Rather surprisingly, seems the answers are no and no. [...]

The 65815 must be handling the increment of the bank address "on the fly" in the data setup time window at the end of cycle 5.

Awesome! -- thanks for your response, Dave. And yes it is quite surprising.

Drass and I both noticed what looked very much like an error in the datasheet. (All it would take is the accidental omission of a footnote.) Thanks again for your assistance in resolving this.

-- Jeff

Posted: **Wed Dec 30, 2020 2:48 pm**

Quote:

The 65815 must be handling the increment of the bank address "on the fly" in the data setup time window at the end of cycle 5.

Thank you so much for running this test! And wow, yes, not what we were expecting ...

Thinking about it, it's possible that the incrementer is inserted at the start of the cycle and applied to the bank byte on the way out. It could then also be used to increment DBR on the fly as needed. Looking at the datasheet, the CPU produces the bank address in 33ns at 14MHz (tBAS at 5V), so just less than half the ~71ns cycle at that clock-rate. Considering that the 65816 can fit a full 16-bit Adder (plus Decimal Mode adjust) in a single cycle, it seems quite reasonable that it can manage an 8-bit increment in half the time. And since the 6502 assumes that memory responds in a half cycle, the time is available anyway!

Unfortunately, the same is not possible in this design where memory is pipelined. Here the address is clocked at the start of the cycle, so the bank byte needs to get to memory *before* the fall of PHI2. There is no time available. Similarly, the cycle is just long enough for memory to respond, so an incrementer on the way in would impact the critical path equally. Rather than slow down the whole CPU for what is likely a relatively infrequent operation, it makes much more sense to adjust the bank byte in an extra cycle in this case.

Thanks again Hoglet. This was very helpful.

Cheers for now,
Drass

Posted: **Tue Mar 02, 2021 12:51 am**

It took while, but I now have a first pass of the 65816 in Logisim! (pic below)

It's a beast, that's for sure, but it looks like it will still make 100MHz!

At least that's the theory -- the reality of laying all this out on impedance-controlled PCBs is sobering to say the least. But be that as it may, I'll just press on for now and we’ll see how it all works out in time.

To recap, the new architecture required the following additions:

A 16-bit adder/incrementer in the ALU (this is an 8-bit adder followed by an 8-bit carry-select incrementer for the high-byte),
A 24-bit carry-select incrementer on the address bus,
16-bit WriteBack and flag evaluation stages (Z flag evaluation is pipelined across two-stages),
16-bit C, X, Y, S, and D registers,
Selectable 16 and 8-bit busing and muxing to shuttle values around,
Mode-dependent register-widths and interrupt vectors,
Initialization of various registers to appropriate values on reset,
A "Loop-on-Carry" function to support the MVP and MVN opcodes,
Tighter encoding of microinstructions to control all this (with associated decode logic).

With that, I'm happy to report that the CPU now supports cycle-accurate 65816 operation with just a few exceptions:

The (dir,x) and dir,x addressing modes take an extra cycle if DH is non-zero,
The long,x and [dir],y addressing modes take an extra cycle if a bank boundary is crossed or if a write is performed,
Decimal Mode takes and extra cycle.

The extra indexing cycles could be overcome with a true 16-bit adder, but at the expense of more hardware and a longer cycle. The same is true for Decimal Mode, where a longer cycle could accommodate a separate decimal adjust circuit in series with the binary adder. In both cases the extra cycles occur relatively infrequently, so it makes more sense instead to forgo compatibility and keep to the faster clock-rate. The CPU will both perform better and require less hardware as a result.

Overall, I'm quite happy with the where this design is at the moment. I have completed "unit testing" on the current model (by which I mean that every instruction has tested correctly at least once). The next step is to run more comprehensive testing using some sort of suite. I'm not planning anything like the Dormann suite, but investing a little time on some test code is sure to pay dividends down the line. If anyone has some 65816 code that can be reasonably adapted to run as part of such a test, that would be very helpful. Complete programs with lots of I/O are probably not great candidates for this. But a compute-intensive code fragment whose result can be tested against a pre-computed value would work very well.

Many thanks to Dr Jefyll, ttlworks, BigEd and Hoglet for their help in getting to this point. As always, all suggestions welcome.

Cheers for now,
Drass

P.S. I thought it might be interesting to mention a few words about the MVP/MVN opcodes. As one might guess, these opcodes move one-byte when executed, and are then "re-executed" for each remaining byte to be moved. This is can done efficiently by inhibiting FetchOpcode for as long as the move is in progress. Meanwhile, the X and Y registers are incremented (or decremented for MVN) and the C accumulator is decremented on every pass. The end of the loop is triggered when C wraps from "0" to "FF", which conveniently fails the “loop on carry” test. The final subtlety here is that PC must be restored on every iteration so the operand bytes can be re-read as normal. (Yes, this is done redundantly even though the operands don't change. The FetchOpcode cycle too is executed redundantly, even when the operation is inhibited. The net result is that the opcode executes it's full contingent of seven cycles for every byte moved). The rationale for this rather awkward implementation is simple: it requires a minimum of specialized hardware. I briefly contemplated an enhancement to skip the re-reading of operands and save two cycles per byte. However, that’s quite messy for the pipeline, so I decided to leave well enough alone and stick with the stock implementation.

Posted: **Tue Mar 02, 2021 9:07 am**

Excellent and marvellous! (But possibly worth a new thread?)

Very interesting that you landed in the same place as the original, with MVN and MVP.

Posted: **Tue Mar 02, 2021 10:52 pm**

BigEd wrote:

Excellent and marvellous! (But possibly worth a new thread?)

Thanks BigEd!

This is the same 100MHz TTL CPU project as before, with the same pipeline and general architecture; it just happens to have a 65816 under the hood!

Quote:

Very interesting that you landed in the same place as the original, with MVN and MVP.

Yes, I was surprised at how easily that solution fell into place. I think it shows a clear bias on the part of WDC toward a "simple as possible" solution for MVP and MVN.

I found it interesting that the implementiton of the TRB and TSB instructions leaned in the opposite direction. These two opcodes use the same number of cycles as any other RMW instruction despite carrying an extra bit of processing. The portion that sets and resets bits in the operand is handled in the usual way: read the operand, OR with Accumulator to set the appropriate bits (or AND with the inverse of the Accumnulator to reset bits), and write the result back to memory. The "test" portion of the instructions, on the other hand, does not use the result of the RMW operation as normal. Instead, the status flags are updated by ANDing the operand with the Accumulator (much like the BIT instruction does). To accomplish both in one cycle requires a dedicated "test" circuit and a separate path through the ALU. The same could have been accomplished without additional hardware with one more cycle. But in this case, WDC opted for the faster solution. (Perhaps there was an expectation that TRB and TSB would be used in timing sensitive I/O operations and needed to be fast?).

Posted: **Sat Mar 06, 2021 4:01 am**

I'm excited to share below the first snippet of 65816 code the model has run.

The CPU boots in "NMOS Compatibility Mode". The $02 NMOS opcode (normally a KIL or JAM undocumented opcode) has been replaced with the XCE instruction. The "CLC, .BYTE $02" sequence therefore switches the CPU to Native Mode before it executes "REP #$30" to enable 16-bit index registers and the 16-bit accumulator.

The sequence starts with a test of the 16-bit N and Z flags. The ORA (dir, X) that follows has a 16-bit operand and uses a 16-bit index register to reference an address on the 16-bit Direct Page. Arriving at the final JMP instruction means the code executed successfully. Hurray!

(Incidentally, I'm using the Kowalski IDE for this. Many thanks 8bit for the 65816 mods!).

Code: Select all

	.ORG $0200
	.OPT proc65816
	

dptr	= $10
target	= $1000

start	SEI		; Disable interrupts
	LDX #$FF	; Initialize Stack Register
	TXS
	CLC
	.BYTE $02	; NMOS XCE -- Switch to Native Mode
	REP #$30	; 16-bit regs -- set X and M flags to 0
	
; Test the 16-bit Z and N flags

	CLC
	LDA !#$8000
	PHP
	PHP
	PLA
	CMP #$8484	; NV00 DIZC = 1000 0100
l1	BNE l1

	CLC
	LDA !#$0000
	PHP
	PHP
	PLA
	CMP !#$0606	; NV00 DIZC = 0000 0110
l2	BNE l2
			
; ORA (X,dir) basic test	
	
	CLC
	LDA #target
	STA dptr+target+3
	LDA !#$00ff
	STA target
	LDX #target+3
	LDA #$ff00
	ORA (dptr,X)
	PHP
	PHP	
	CMP #$FFFF
l3	BNE l3
	PLA
	CMP #$8484	; NV00 DIZC = 1000 0100
l4	BNE l4
	
; ORA (X,dir) page cross

l5	JMP l5

	.ORG $FFE0

	.BYTE $00, $00, $00, $00, $00, $00, $00, $00
	.BYTE $00, $00, $00, $00, $00, $00, $00, $00
	.BYTE $00, $00, $00, $00, $00, $00, $00, $00
	.BYTE $00, $00, $00, $00, $00, $02, $00, $00

Posted: **Sat Mar 06, 2021 11:04 am**

Hi Drass,

I'm confused. I thought opcode xce was $FB and opcode cop was $02.

Cheers,
Andy

Posted: **Sat Mar 06, 2021 2:57 pm**

Thanks for dropping by Andy. I should have been more clear in the explanation of how all this is working ...

This CPU will allow you to select the instruction-set you want for Emulation Mode. The choices are NMOS 6502 or 65816. The default selection is made via a dip-switch, which is currently fixed as the the NMOS instruction-set. Hence the CPU will emulate an NMOS 6502 at boot up, including undocumented opcodes.

Now the NMOS instruction-set does not have the XCE instruction of course. $fb opcode on NMOS chips is the ISC abs, Y undocumented instruction. It performs both an INC abs,Y as well as an SBC abs,Y. While we might debate how useful it is, it's important to retain it nevertheless for full compatibility with existing NMOS code.

So, then, we need a different opcode for XCE. The $02 opcode is a good candidate. On NMOS chips, this opcode performs a JAM operation (also documented as KIL or HLT), which essentially will freeze the CPU until reset. That makes it a fairly safe choice for repurposing since it's very unlikely that someone would use it on purpose. I've therefore taken the liberty of replacing the $02 NMOS JAM instruction with an XCE instruction which is functionally equivalent to the traditional 65816 XCE. $02 can therefore be used to switch to Native Mode from NMOS Emulation Mode.

Once the $02 opcode is executed, the CPU will switch to Native Mode, but then also switch to the full 65816 instruction-set. At that point, $02 will be revert to the normal COP instruction. From Native Mode, switching back to Emulation Mode can be done as normal with the regular $fb XCE instruction.

I admit this is all rather confusing. The objective of all this akwardness is to enable the CPU to run existing NMOS code on a C64 at boot up while allowing new programs to switch to Native Mode for 65816 functionality. It's a lot of gymnastics but I think it will be worth it in the end.

Hope that helps,
Drass

Edit:

P.S. I'll just add a note to say that switching instruction-sets on the fly in a pipelined CPU took some careful thought. Writing to the E flag occurs during in the WB stage, and there are other microinstructions in the pipeline by that point. Thankfully it turns out that the switch occurs just in time so it all works smoothly.

Posted: **Sun Mar 07, 2021 2:29 pm**

Hi Drass,

Thanks for the thorough explanation; the logic is sound and I applaud your efforts as I'm a 65816 fan.

Cheers,
Andy

Posted: **Mon Mar 08, 2021 2:22 am**

Thanks Andy. I’m fast becoming a fan of the 65816 myself. Picking up a 16-bit value in the accumulator and shifting it left will make a convert out of anyone; or how about indexing across 64k with a single register. I have a sentimental attachment to the NMOS 6502, but I have to say that programming the 65816 is a pleasure.

Posted: **Sat Mar 13, 2021 3:52 pm**

Drass wrote:

I have completed "unit testing" on the current model (by which I mean that every instruction has tested correctly at least once).

Did I really say this? Hmmm ... based on the bugs my test suite is uncovering, I can categorically say this was NOT accurate.

On the other hand, the kinds of bugs I'm running into are far from surprising:

1) Incorrect encoding of microinstructions (assembly of microinstructions is partially a manual process),
2) Incorrect decoding of microinstructions (done with 74LVC138 1-of-8 decoders followed by error-prone PAL-like gate logic),
3) Unforeseen side-effects of decoded control signals (mostly due to tricky encoding required to pack more functions into fewer control bits).

The good news is that the overall pipeline and busing architecture is holding up well. The test harness now includes opcodes up to $3F, which means the model runs through various addressing modes with OR, AND, ASL and ROL, as well as exercising BRK, COP, RTI, JSR, JSL, RTS, RTL, TSB, TRB. I have also been careful to include page and bank crossings as we go. For example:

Code: Select all

; 17 ORA [dir],Y (Crossing banks)

	LDY !#$ffff
	LDA !#$2187	; long pointer address
	STA dpage
	LDA !#$00	; long pointer bank
	STA dpage+2
	LDA !#$0018	; place marker in bank 1
	STA $002187+$ffff
	LDA !#$ff00
	REP #$CB	; clear the flags
	ORA [$00],Y
	PHP
	PHP
	CMP !#$ff18
l17.5	BNE l17.5
	PLA
	CMP !#$8484	; NV00 DIZC = 1000 0100	
l17.6	BNE l17.6

I continue to find bugs at a good rate so it's important to keep going. (I should also mention that current testing assumes the E, M and X flags area all zero and that both DL and DH are non-zero; additional testing will be required to cover off other combinations. The notion of exhaustively testing this CPU is probably too ambitious, but every bit of validation counts!

).

Cheers for now,
Drass

6502.org

100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!

Re: 100MHz TTL 6502: Here we go!