6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri May 10, 2024 12:43 am

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Thu Jun 02, 2022 6:56 am 
Offline

Joined: Sun Oct 03, 2021 2:17 am
Posts: 114
I'm confused whether to think of the 65816's A accumulator as changing in width along with the m flag (because LDA is how you load both 8 and 16 bits values into the accumulator) or A as always being 8 bits (because of the XBA instruction). How do you think of A's width?

I'd probably appreciate an assembler that understands LDC and STC as mnemonics so I could make this a little more clear in my mind; the assembler would bark if you try to use them in emulation mode (Rice's theorem notwithstanding, the assembler would have to rely on whether it thinks native or emulation mode is active). I do not know of a way to implement LDB and STB without clobbering the neighboring 8-bit A or adjacent byte. EDIT: thanks to oziphanto who just pointed out that XBA, LDA, XBA implements LDB.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2022 7:01 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
I think of A as operating either as 8 or 16 bits depending on M.

B can be thought of as the top half of the wide A, if you like. XBA is a special case! B isn't nearly as useful as I first thought it would be - it falls far far short of being a second accumulator.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2022 7:54 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1411
Location: Scotland
M changes "memory" mode to be 8 or 16 bits. The A register is considered "memory", so for me, it's variable width and matches memory load/stores (which makes sense).

One thing to watch out for is that if you are in 8-bit M mode and load a value (8-bits) into A then switch to 16-bit mode the A register is extended to 16-bits (as you might expect) by tagging on the B register - with whatever that contains which is in contrast with X mode as when you change that from 8 to 16-bits the top 8-bits of the new 16-bit index register is forced to zero.

My bugbear here is what I wish it were possible to do an 8-bit load while in 16-bit M mode which would force the top 8 bits to zero. Just like every other 16/32/64/ bit CPU I've used... But maybe that's just me and bytecode interpreters...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2022 8:37 am 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
I also think of A as having a variable width. For what I'm working on now, I figured out early on that it's easiest just to stay in 16-bit mode and only switch to 8-bit mode when needed and switched back immediately afterwards.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2022 9:32 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1411
Location: Scotland
tmr4 wrote:
I also think of A as having a variable width. For what I'm working on now, I figured out early on that it's easiest just to stay in 16-bit mode and only switch to 8-bit mode when needed and switched back immediately afterwards.


That was my strategy for my (BCPL) bytecode interpreter, but that 16-bit fetch and subsequent AND #$00FF adds so many cycles to each bytecode fetch it measurably slows it down )-: (and switching to 8-bit mode here doesn't help either unless I could guarantee zero in the upper 8-bits)

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 02, 2022 2:50 pm 
Offline

Joined: Sat Feb 19, 2022 10:14 pm
Posts: 147
drogon wrote:
That was my strategy for my (BCPL) bytecode interpreter, but that 16-bit fetch and subsequent AND #$00FF adds so many cycles to each bytecode fetch it measurably slows it down )-: (and switching to 8-bit mode here doesn't help either unless I could guarantee zero in the upper 8-bits)

Yes, I've had similar situations. This is where a few dedicated byte specific instructions would be helpful. I'm always looking for ways to handle byte level data 2 bytes at a time in 16-bit mode. For example I use the output_2ch routine below to output 2 characters at a time:
Code:
output_ch:      jmp put_c
output_2ch:     jsr output_ch
                xba
                jmp put_c

I use this when outputing null terminated ascii strings. That way I can stay in 16-bit mode and process the string a bit quicker than handling the characters one at a time. The only modification is that the string needs to be terminated with a null word to handle both odd and even length strings.

Of course, faster would be a dedicated low-level routine to handle string output as the put_c routine above calls a low-level ACIA routine that switches to 8-bit mode and back to 16-bit mode for each character. I might do this someday but my LCD can't keep up with the output as it is so it's not a high priority. On a side note: unfortunately you can't write 16-bit value to the 65C51 transmitter register because it will cause a programmed reset of the chip.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jun 03, 2022 3:00 am 
Offline

Joined: Sat Dec 30, 2017 3:19 pm
Posts: 116
Location: Detroit, Michigan, USA
tmr4 wrote:
Of course, faster would be a dedicated low-level routine to handle string output as the put_c routine above calls a low-level ACIA routine that switches to 8-bit mode and back to 16-bit mode for each character.


That is what I did with my BIOS; I have a specific call for printing null-terminated strings, and it stays in 8-bit mode while doing it. My BIOS dispatches are via COP so it saves a lot of syscall overhead. Only part I don't like about it is that it's arguably not something the BIOS should be doing. Probably what I should do is replace this with multi-byte read/write calls that pass a buffer and number of bytes, then reimplement my current call as a library call on top of the multi-byte write.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 04, 2022 9:46 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8178
Location: Midwestern USA
BigEd wrote:
B can be thought of as the top half of the wide A, if you like. XBA is a special case! B isn't nearly as useful as I first thought it would be - it falls far far short of being a second accumulator.

Using XBA, .B is useful as a temporary store when working with byte-sized data, e.g., as in this DUART driver code fragment:

Code:
03425  00D362  A1 24                  lda (siofif,x)        ;no, get datum from RHR  <———
03426  00D364  EB                     xba                   ;protect it for now <———
03427  00D365  B5 34                  lda sioputrx,x        ;get queue 'put' pointer <———
03428  00D367  1A                     inc                   ;bump it
03429  00D368  D5 2C                  cmp siogetrx,x        ;any room in queue?
03430  00D36A  F0 F0                  beq .0000020          ;no, discard datum
03431  ;
03432  00D36C  EB                     xba                   ;yes, recover datum &...  <———
03433  00D36D  81 34                  sta (sioputrx,x)      ;store in queue <———

XBA is also handy for reversing the endianess of a 16-bit value. Beyond that, you’re right: .B doesn't do much for you.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2022 5:44 pm 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 123
Location: Lake Tahoe
drogon wrote:
tmr4 wrote:
I also think of A as having a variable width. For what I'm working on now, I figured out early on that it's easiest just to stay in 16-bit mode and only switch to 8-bit mode when needed and switched back immediately afterwards.


That was my strategy for my (BCPL) bytecode interpreter, but that 16-bit fetch and subsequent AND #$00FF adds so many cycles to each bytecode fetch it measurably slows it down )-: (and switching to 8-bit mode here doesn't help either unless I could guarantee zero in the upper 8-bits)

-Gordon


I found it useful to keep the M flag as 16 bit and the index registers in 8 bit mode for PLASMA. My byte code fetch/dispatch is:

Code:
        INY                     ; NEXTOP @ $F0
        LDX     $FFFF,Y         ; FETCHOP @ $F1, IP MAPS OVER $FFFF @ $F2
        JMP     (OPTBL,X)       ; OPIDX AND OPPAGE MAP OVER OPTBL


and runs out of zero page. Y is the interpreters instruction pointer (IP) offset. The IP+Y value gets renormalized during certain instructions' execution to keep it from overflowing. Also, the byte codes are even, so there is no requirement to shift the value but it does limit the number of byte codes to 128 with an 8 bit X register.

On occasion I do have to switch A to 8 bit mode, but at least it isn't every byte code fetch. Mixing accumulator/memory and index widths can limit the about of width flag thrashing with a little creativity.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 06, 2022 7:17 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1411
Location: Scotland
resman wrote:
drogon wrote:
tmr4 wrote:
I also think of A as having a variable width. For what I'm working on now, I figured out early on that it's easiest just to stay in 16-bit mode and only switch to 8-bit mode when needed and switched back immediately afterwards.


That was my strategy for my (BCPL) bytecode interpreter, but that 16-bit fetch and subsequent AND #$00FF adds so many cycles to each bytecode fetch it measurably slows it down )-: (and switching to 8-bit mode here doesn't help either unless I could guarantee zero in the upper 8-bits)

-Gordon


I found it useful to keep the M flag as 16 bit and the index registers in 8 bit mode for PLASMA. My byte code fetch/dispatch is:

Code:
        INY                     ; NEXTOP @ $F0
        LDX     $FFFF,Y         ; FETCHOP @ $F1, IP MAPS OVER $FFFF @ $F2
        JMP     (OPTBL,X)       ; OPIDX AND OPPAGE MAP OVER OPTBL


and runs out of zero page. Y is the interpreters instruction pointer (IP) offset. The IP+Y value gets renormalized during certain instructions' execution to keep it from overflowing. Also, the byte codes are even, so there is no requirement to shift the value but it does limit the number of byte codes to 128 with an 8 bit X register.

On occasion I do have to switch A to 8 bit mode, but at least it isn't every byte code fetch. Mixing accumulator/memory and index widths can limit the about of width flag thrashing with a little creativity.


That's nice and simple. Hm.

I have a full compliment of 256 opcodes to deal with... I did try to work through scenarios where I could use an index register as the program counter, but didn't come up with anything useful, also keeping it in zero page and self-modifying the "PC" but it was also handy to have the VM's PC kept as a 32-bit register in zero page for various arithmetic operations - like jumps and so on.

This is the current dispatcher:

Code:
; Loop of the interpreter

        lda     [regPC]         ; Load 16-bit value                             (7)
        and     #$00FF          ; We only want 8-bits...                        (3)
        asl                     ; Double for indexing in 16-bit wide jump table (2)
        tax                                                                     (2)

; Increment the PC

        inc     regPC+0         ; Low word                                      (7)
        beq     incH1           ; 2 cycles + 1 when branch taken                (2)
        jmp     (opcodesLo,x)                                                   (6) = 29

incH1:  inc     regPC+2         ; Top word                                      (7)
        jmp     (opcodesLo,x)                                                   (6) = 37


This is a macro that's in-lined with every opcode the VM handles. It adds space but the cycles it saves are worth it. So 29 cycles or rarely 37 cycles overhead for every VM opcode that's executed.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 14, 2022 11:53 am 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
tmr4 on Thu 2 Jun 2022 wrote:
Code:
output_ch:      jmp put_c
output_2ch:     jsr output_ch
                xba
                jmp put_c


It is sometimes useful for a subroutine to jump into the body of itself. This is not a violation of structured programming single return statement:

Code:
output_2ch:     jsr output_ch
                xba
output_ch:
put_c:


This is probably well known but I discovered it when trying to implement 8/16 bit types in SWEET16 on Z80 for CollapseOS. For the second time, I remembered why 8080 derivatives are awful and I abandoned it.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: