6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 9:30 pm

All times are UTC




Post new topic Reply to topic  [ 182 posts ]  Go to page Previous  1 ... 8, 9, 10, 11, 12, 13  Next
Author Message
PostPosted: Tue Dec 29, 2020 2:04 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
BigEd wrote:
(One of the things which stands out when you look at the 65816 die shot is that the datapath is really long and the 'PLA' really wide. For the datapath to be so long, without empty space, there could be a few extra bits in there which make a big difference - such as, more than one incrementer... and indeed, it turns out there are details in the now-expired patents, see the thread.)
Thanks for the link BigEd. That patent application is fascinating, especially its focus on chip layout, capacitance and the associated impact on min-terms -- a real window into design tradeoffs. Very interesting, and very much on point.

To wit, it's possible in this design to improve the efficiency of the 8-bit adder further by using a carry-select mux for the upper nibble. This would provide a comfortable margin for a carry-select incrementer in the ALU to manage 16-bit addresses in a single cycle. However, it comes at the cost of extra ICs and a more involved layout. The trade off may be well worth it, but we will need to see how feasible the layout turns out to be in the end. (Kicking that layout can down the road yet again!). :roll: :)

Getting back to cycle accuracy, I did confirm that PC wraps on a bank boundary as does MVP/MVN. That said, I'm seeing some inconsistencies regarding cycle-counts in other areas that are puzzling. I'll try to post about it shortly.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 29, 2020 8:17 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
As mentioned in the previous post, I can't quite make sense of some 65816 cycle counts. An example is the Direct Indirect Long Indexed ("[dir],Y") addressing mode.

Various descriptions suggest that the bank byte in this addressing mode is adjusted if the index causes a bank boundary to be crossed. (see WDC 65816 Programming Manual p 305, and Bruce Clarke's 65816 Opcodes tutorial Section 5.13). However, there appear to be too few cycles available in the reported count to do this. The cycle-by-cycle execution is shown on the WDC 65816 Datasheet, p. 39, Table 5-7, row 14 as follows:
Code:
Cycle   Address Bus     Data Bus
-----   -----------     --------
1.      PBR, PC         Opcode
2.      PBR, PC+1       DO
2a.     PBR, PC+1       Internal Operation
3.      D+DO            AAL
4.      D+DO+1          AAH
5.      D+DO+2          AAB
6.      AAB,AA+Y        Data Low
6a.     AAB,AA+Y+1      Data High
(where cycle 2a is executed if DL != 0 and cycle 6a if bit m = 0). The difficulty is that there is no time for the CPU to adjust the bank byte between cycle 5 when it is read, and cycle 6 when it is used as part of the effective address. Interestingly, cycle 6 and 6a are shown here to use "AAB,AA+Y", which suggests the index might be applied to the 16-bit base address only, before being appended to the unadjusted bank byte to form a 24-bit effective address.

That somehow seems unlikely, but otherwise an extra cycle must be lurking somewhere (or perhaps the 65816 is adjusting the high-byte on the fly?). I wonder if someone more familiar with the 65816 can comment. Does [dir],Y wrap on a bank boundary? If not, does a bank-crossing trigger an extra cycle?

Thanks in advance for any thoughts,
Drass

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 30, 2020 11:39 am 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 352
Drass wrote:
Does [dir],Y wrap on a bank boundary? If not, does a bank-crossing trigger an extra cycle?

Rather surprisingly, seems the answers are no and no.

I did a small test on Beeb816, captured a trace using a cheap logic analyzer, and fed this through my 65816 Bus Cycle Decoder.

Here is the test program:
Attachment:
capture9.png
capture9.png [ 2.9 KiB | Viewed 1453 times ]

Apologies for not using a proper 65816 assembler:
- 8F cc bb aa is STA aabbcc
- B8 aa is LDA [aa], Y

Here's it running:
Attachment:
capture8.png
capture8.png [ 3.39 KiB | Viewed 1453 times ]


And here's the cycle by cycle output of the 65816 Bus Cycle Decoder on the captured logic analyzer trace:
Code:
0 a9 O 1 1
1 55 P 1 1
002000 : A9 55       : LDA #55        : 2 : A=??55 X=0000 Y=0000 SP=01FD N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 f8 P 1 1
4 55 D 0 1
002002 : 8F 00 00 F8 : STA F80000     : 5 : A=??55 X=0000 Y=0000 SP=01FD N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a9 O 1 1
1 aa P 1 1
002006 : A9 AA       : LDA #AA        : 2 : A=??AA X=0000 Y=0000 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 f9 P 1 1
4 aa D 0 1
002008 : 8F 00 00 F9 : STA F90000     : 5 : A=??AA X=0000 Y=0000 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a0 O 1 1
1 80 P 1 1
00200C : A0 80       : LDY #80        : 2 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 84 O 1 1
1 80 P 1 1
2 80 D 0 1
00200E : 84 80       : STY 80         : 3 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a9 O 1 1
1 ff P 1 1
002010 : A9 FF       : LDA #FF        : 2 : A=??FF X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 85 O 1 1
1 81 P 1 1
2 ff D 0 1
002012 : 85 81       : STA 81         : 3 : A=??FF X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 a9 O 1 1
1 f8 P 1 1
002014 : A9 F8       : LDA #F8        : 2 : A=??F8 X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 85 O 1 1
1 82 P 1 1
2 f8 D 0 1
002016 : 85 82       : STA 82         : 3 : A=??F8 X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 b7 O 1 1
1 80 P 1 1
2 80 D 1 1
3 ff D 1 1
4 f8 D 1 1
5 aa D 1 1
002018 : B7 80       : LDA [80],Y     : 6 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 85 O 1 1
1 83 P 1 1
2 aa D 0 1
00201A : 85 83       : STA 83         : 3 : A=??AA X=0000 Y=0080 SP=01FD N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000
0 60 O 1 1
1 00 I 1 1
2 00 I 1 1
3 16 D 1 1
4 8f D 1 1
5 00 I 1 1
00201C : 60          : RTS            : 6 : A=??AA X=0000 Y=0080 SP=01FF N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=1 PB=00 DB=00 DP=0000

LDA [80],Y really does only take 6 cycles, even when crossing bank boundaries.

The 65815 must be handling the increment of the bank address "on the fly" in the data setup time window at the end of cycle 5.

Dave


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 30, 2020 12:03 pm 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 352
Here's the same test, in BBC Basic 816, which has a full 65816 Assembler built in and runs in native mode:
Attachment:
capture11.png
capture11.png [ 2.83 KiB | Viewed 1450 times ]

Attachment:
capture12.png
capture12.png [ 3.35 KiB | Viewed 1450 times ]


Same result:
Code:
0 a9 O 1 1
1 aa P 1 1
002000 : A9 AA       : LDA #AA        : 2 : A=00AA X=4ECB Y=681D SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 fd P 1 1
4 aa D 0 1
002002 : 8F 00 00 FD : STA FD0000     : 5 : A=00AA X=4ECB Y=681D SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a9 O 1 1
1 55 P 1 1
002006 : A9 55       : LDA #55        : 2 : A=0055 X=4ECB Y=681D SP=01C9 N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 8f O 1 1
1 00 P 1 1
2 00 P 1 1
3 fc P 1 1
4 55 D 0 1
002008 : 8F 00 00 FC : STA FC0000     : 5 : A=0055 X=4ECB Y=681D SP=01C9 N=0 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a0 O 1 1
1 80 P 1 1
00200C : A0 80       : LDY #80        : 2 : A=0055 X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 84 O 1 1
1 80 P 1 1
2 80 D 0 1
00200E : 84 80       : STY 80         : 3 : A=0055 X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a9 O 1 1
1 ff P 1 1
002010 : A9 FF       : LDA #FF        : 2 : A=00FF X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 85 O 1 1
1 81 P 1 1
2 ff D 0 1
002012 : 85 81       : STA 81         : 3 : A=00FF X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 a9 O 1 1
1 fc P 1 1
002014 : A9 FC       : LDA #FC        : 2 : A=00FC X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 85 O 1 1
1 82 P 1 1
2 fc D 0 1
002016 : 85 82       : STA 82         : 3 : A=00FC X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 b7 O 1 1
1 80 P 1 1
2 80 D 1 1
3 ff D 1 1
4 fc D 1 1
5 aa D 1 1
002018 : B7 80       : LDA [80],Y     : 6 : A=00AA X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 85 O 1 1
1 83 P 1 1
2 aa D 0 1
00201A : 85 83       : STA 83         : 3 : A=00AA X=4ECB Y=0080 SP=01C9 N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=00 DB=00 DP=1900
0 6b O 1 1
1 00 I 1 1
2 00 I 1 1
3 ba D 1 1
4 30 D 1 1
5 f8 D 1 1
00201C : 6B          : RTL            : 6 : A=00AA X=4ECB Y=0080 SP=01CC N=1 V=0 M=1 X=1 D=0 I=0 Z=0 C=0 E=0 PB=F8 DB=00 DP=1900

Dave


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 30, 2020 2:47 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
hoglet wrote:
Drass wrote:
Does [dir],Y wrap on a bank boundary? If not, does a bank-crossing trigger an extra cycle?

Rather surprisingly, seems the answers are no and no. [...]

The 65815 must be handling the increment of the bank address "on the fly" in the data setup time window at the end of cycle 5.
Awesome! -- thanks for your response, Dave. And yes it is quite surprising.

Drass and I both noticed what looked very much like an error in the datasheet. (All it would take is the accidental omission of a footnote.) Thanks again for your assistance in resolving this.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 30, 2020 2:48 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Quote:
The 65815 must be handling the increment of the bank address "on the fly" in the data setup time window at the end of cycle 5.
Thank you so much for running this test! And wow, yes, not what we were expecting ...

Thinking about it, it's possible that the incrementer is inserted at the start of the cycle and applied to the bank byte on the way out. It could then also be used to increment DBR on the fly as needed. Looking at the datasheet, the CPU produces the bank address in 33ns at 14MHz (tBAS at 5V), so just less than half the ~71ns cycle at that clock-rate. Considering that the 65816 can fit a full 16-bit Adder (plus Decimal Mode adjust) in a single cycle, it seems quite reasonable that it can manage an 8-bit increment in half the time. And since the 6502 assumes that memory responds in a half cycle, the time is available anyway! :shock:

Unfortunately, the same is not possible in this design where memory is pipelined. Here the address is clocked at the start of the cycle, so the bank byte needs to get to memory *before* the fall of PHI2. There is no time available. Similarly, the cycle is just long enough for memory to respond, so an incrementer on the way in would impact the critical path equally. Rather than slow down the whole CPU for what is likely a relatively infrequent operation, it makes much more sense to adjust the bank byte in an extra cycle in this case.

Thanks again Hoglet. This was very helpful.

Cheers for now,
Drass

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 02, 2021 12:51 am 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
It took while, but I now have a first pass of the 65816 in Logisim! (pic below)

It's a beast, that's for sure, but it looks like it will still make 100MHz! :shock: At least that's the theory -- the reality of laying all this out on impedance-controlled PCBs is sobering to say the least. But be that as it may, I'll just press on for now and we’ll see how it all works out in time.

To recap, the new architecture required the following additions:
  • A 16-bit adder/incrementer in the ALU (this is an 8-bit adder followed by an 8-bit carry-select incrementer for the high-byte),
  • A 24-bit carry-select incrementer on the address bus,
  • 16-bit WriteBack and flag evaluation stages (Z flag evaluation is pipelined across two-stages),
  • 16-bit C, X, Y, S, and D registers,
  • Selectable 16 and 8-bit busing and muxing to shuttle values around,
  • Mode-dependent register-widths and interrupt vectors,
  • Initialization of various registers to appropriate values on reset,
  • A "Loop-on-Carry" function to support the MVP and MVN opcodes,
  • Tighter encoding of microinstructions to control all this (with associated decode logic).

With that, I'm happy to report that the CPU now supports cycle-accurate 65816 operation with just a few exceptions:
  • The (dir,x) and dir,x addressing modes take an extra cycle if DH is non-zero,
  • The long,x and [dir],y addressing modes take an extra cycle if a bank boundary is crossed or if a write is performed,
  • Decimal Mode takes and extra cycle.

The extra indexing cycles could be overcome with a true 16-bit adder, but at the expense of more hardware and a longer cycle. The same is true for Decimal Mode, where a longer cycle could accommodate a separate decimal adjust circuit in series with the binary adder. In both cases the extra cycles occur relatively infrequently, so it makes more sense instead to forgo compatibility and keep to the faster clock-rate. The CPU will both perform better and require less hardware as a result.

Overall, I'm quite happy with the where this design is at the moment. I have completed "unit testing" on the current model (by which I mean that every instruction has tested correctly at least once). The next step is to run more comprehensive testing using some sort of suite. I'm not planning anything like the Dormann suite, but investing a little time on some test code is sure to pay dividends down the line. If anyone has some 65816 code that can be reasonably adapted to run as part of such a test, that would be very helpful. Complete programs with lots of I/O are probably not great candidates for this. But a compute-intensive code fragment whose result can be tested against a pre-computed value would work very well.

Many thanks to Dr Jefyll, ttlworks, BigEd and Hoglet for their help in getting to this point. As always, all suggestions welcome.

Cheers for now,
Drass

P.S. I thought it might be interesting to mention a few words about the MVP/MVN opcodes. As one might guess, these opcodes move one-byte when executed, and are then "re-executed" for each remaining byte to be moved. This is can done efficiently by inhibiting FetchOpcode for as long as the move is in progress. Meanwhile, the X and Y registers are incremented (or decremented for MVN) and the C accumulator is decremented on every pass. The end of the loop is triggered when C wraps from "0" to "FF", which conveniently fails the “loop on carry” test. The final subtlety here is that PC must be restored on every iteration so the operand bytes can be re-read as normal. (Yes, this is done redundantly even though the operands don't change. The FetchOpcode cycle too is executed redundantly, even when the operation is inhibited. The net result is that the opcode executes it's full contingent of seven cycles for every byte moved). The rationale for this rather awkward implementation is simple: it requires a minimum of specialized hardware. I briefly contemplated an enhancement to skip the re-reading of operands and save two cycles per byte. However, that’s quite messy for the pipeline, so I decided to leave well enough alone and stick with the stock implementation. :)


Attachments:
C74-100 Logisim (V3.1).png
C74-100 Logisim (V3.1).png [ 2.71 MiB | Viewed 1305 times ]

_________________
C74-6502 Website: https://c74project.com
Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 02, 2021 9:07 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Excellent and marvellous! (But possibly worth a new thread?)

Very interesting that you landed in the same place as the original, with MVN and MVP.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 02, 2021 10:52 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
BigEd wrote:
Excellent and marvellous! (But possibly worth a new thread?)
Thanks BigEd!

This is the same 100MHz TTL CPU project as before, with the same pipeline and general architecture; it just happens to have a 65816 under the hood! :twisted: :)

Quote:
Very interesting that you landed in the same place as the original, with MVN and MVP.
Yes, I was surprised at how easily that solution fell into place. I think it shows a clear bias on the part of WDC toward a "simple as possible" solution for MVP and MVN.

I found it interesting that the implementiton of the TRB and TSB instructions leaned in the opposite direction. These two opcodes use the same number of cycles as any other RMW instruction despite carrying an extra bit of processing. The portion that sets and resets bits in the operand is handled in the usual way: read the operand, OR with Accumulator to set the appropriate bits (or AND with the inverse of the Accumnulator to reset bits), and write the result back to memory. The "test" portion of the instructions, on the other hand, does not use the result of the RMW operation as normal. Instead, the status flags are updated by ANDing the operand with the Accumulator (much like the BIT instruction does). To accomplish both in one cycle requires a dedicated "test" circuit and a separate path through the ALU. The same could have been accomplished without additional hardware with one more cycle. But in this case, WDC opted for the faster solution. (Perhaps there was an expectation that TRB and TSB would be used in timing sensitive I/O operations and needed to be fast?).

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 06, 2021 4:01 am 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
I'm excited to share below the first snippet of 65816 code the model has run.

The CPU boots in "NMOS Compatibility Mode". The $02 NMOS opcode (normally a KIL or JAM undocumented opcode) has been replaced with the XCE instruction. The "CLC, .BYTE $02" sequence therefore switches the CPU to Native Mode before it executes "REP #$30" to enable 16-bit index registers and the 16-bit accumulator.

The sequence starts with a test of the 16-bit N and Z flags. The ORA (dir, X) that follows has a 16-bit operand and uses a 16-bit index register to reference an address on the 16-bit Direct Page. Arriving at the final JMP instruction means the code executed successfully. Hurray! :D

(Incidentally, I'm using the Kowalski IDE for this. Many thanks 8bit for the 65816 mods!).
Code:
   .ORG $0200
   .OPT proc65816
   

dptr   = $10
target   = $1000

start   SEI      ; Disable interrupts
   LDX #$FF   ; Initialize Stack Register
   TXS
   CLC
   .BYTE $02   ; NMOS XCE -- Switch to Native Mode
   REP #$30   ; 16-bit regs -- set X and M flags to 0
   
; Test the 16-bit Z and N flags

   CLC
   LDA !#$8000
   PHP
   PHP
   PLA
   CMP #$8484   ; NV00 DIZC = 1000 0100
l1   BNE l1

   CLC
   LDA !#$0000
   PHP
   PHP
   PLA
   CMP !#$0606   ; NV00 DIZC = 0000 0110
l2   BNE l2
         
; ORA (X,dir) basic test   
   
   CLC
   LDA #target
   STA dptr+target+3
   LDA !#$00ff
   STA target
   LDX #target+3
   LDA #$ff00
   ORA (dptr,X)
   PHP
   PHP   
   CMP #$FFFF
l3   BNE l3
   PLA
   CMP #$8484   ; NV00 DIZC = 1000 0100
l4   BNE l4
   
; ORA (X,dir) page cross

l5   JMP l5

   .ORG $FFE0

   .BYTE $00, $00, $00, $00, $00, $00, $00, $00
   .BYTE $00, $00, $00, $00, $00, $00, $00, $00
   .BYTE $00, $00, $00, $00, $00, $00, $00, $00
   .BYTE $00, $00, $00, $00, $00, $02, $00, $00

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 06, 2021 11:04 am 
Offline

Joined: Mon Sep 14, 2015 8:50 pm
Posts: 112
Location: Virginia USA
Hi Drass,

I'm confused. I thought opcode xce was $FB and opcode cop was $02.

Cheers,
Andy


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 06, 2021 2:57 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Thanks for dropping by Andy. I should have been more clear in the explanation of how all this is working ...

This CPU will allow you to select the instruction-set you want for Emulation Mode. The choices are NMOS 6502 or 65816. The default selection is made via a dip-switch, which is currently fixed as the the NMOS instruction-set. Hence the CPU will emulate an NMOS 6502 at boot up, including undocumented opcodes.

Now the NMOS instruction-set does not have the XCE instruction of course. $fb opcode on NMOS chips is the ISC abs, Y undocumented instruction. It performs both an INC abs,Y as well as an SBC abs,Y. While we might debate how useful it is, it's important to retain it nevertheless for full compatibility with existing NMOS code.

So, then, we need a different opcode for XCE. The $02 opcode is a good candidate. On NMOS chips, this opcode performs a JAM operation (also documented as KIL or HLT), which essentially will freeze the CPU until reset. That makes it a fairly safe choice for repurposing since it's very unlikely that someone would use it on purpose. I've therefore taken the liberty of replacing the $02 NMOS JAM instruction with an XCE instruction which is functionally equivalent to the traditional 65816 XCE. $02 can therefore be used to switch to Native Mode from NMOS Emulation Mode.

Once the $02 opcode is executed, the CPU will switch to Native Mode, but then also switch to the full 65816 instruction-set. At that point, $02 will be revert to the normal COP instruction. From Native Mode, switching back to Emulation Mode can be done as normal with the regular $fb XCE instruction.

I admit this is all rather confusing. The objective of all this akwardness is to enable the CPU to run existing NMOS code on a C64 at boot up while allowing new programs to switch to Native Mode for 65816 functionality. It's a lot of gymnastics but I think it will be worth it in the end.

Hope that helps,
Drass

Edit:

P.S. I'll just add a note to say that switching instruction-sets on the fly in a pipelined CPU took some careful thought. Writing to the E flag occurs during in the WB stage, and there are other microinstructions in the pipeline by that point. Thankfully it turns out that the switch occurs just in time so it all works smoothly. :)

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 07, 2021 2:29 pm 
Offline

Joined: Mon Sep 14, 2015 8:50 pm
Posts: 112
Location: Virginia USA
Hi Drass,

Thanks for the thorough explanation; the logic is sound and I applaud your efforts as I'm a 65816 fan.

Cheers,
Andy


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 08, 2021 2:22 am 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Thanks Andy. I’m fast becoming a fan of the 65816 myself. Picking up a 16-bit value in the accumulator and shifting it left will make a convert out of anyone; or how about indexing across 64k with a single register. I have a sentimental attachment to the NMOS 6502, but I have to say that programming the 65816 is a pleasure.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 13, 2021 3:52 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Drass wrote:
I have completed "unit testing" on the current model (by which I mean that every instruction has tested correctly at least once).
Did I really say this? Hmmm ... based on the bugs my test suite is uncovering, I can categorically say this was NOT accurate.

On the other hand, the kinds of bugs I'm running into are far from surprising:

1) Incorrect encoding of microinstructions (assembly of microinstructions is partially a manual process),
2) Incorrect decoding of microinstructions (done with 74LVC138 1-of-8 decoders followed by error-prone PAL-like gate logic),
3) Unforeseen side-effects of decoded control signals (mostly due to tricky encoding required to pack more functions into fewer control bits).

The good news is that the overall pipeline and busing architecture is holding up well. The test harness now includes opcodes up to $3F, which means the model runs through various addressing modes with OR, AND, ASL and ROL, as well as exercising BRK, COP, RTI, JSR, JSL, RTS, RTL, TSB, TRB. I have also been careful to include page and bank crossings as we go. For example:
Code:
; 17 ORA [dir],Y (Crossing banks)

   LDY !#$ffff
   LDA !#$2187   ; long pointer address
   STA dpage
   LDA !#$00   ; long pointer bank
   STA dpage+2
   LDA !#$0018   ; place marker in bank 1
   STA $002187+$ffff
   LDA !#$ff00
   REP #$CB   ; clear the flags
   ORA [$00],Y
   PHP
   PHP
   CMP !#$ff18
l17.5   BNE l17.5
   PLA
   CMP !#$8484   ; NV00 DIZC = 1000 0100   
l17.6   BNE l17.6


I continue to find bugs at a good rate so it's important to keep going. (I should also mention that current testing assumes the E, M and X flags area all zero and that both DL and DH are non-zero; additional testing will be required to cover off other combinations. The notion of exhaustively testing this CPU is probably too ambitious, but every bit of validation counts! :)).

Cheers for now,
Drass

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 182 posts ]  Go to page Previous  1 ... 8, 9, 10, 11, 12, 13  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 25 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: