6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 1:31 pm

All times are UTC




Post new topic Reply to topic  [ 266 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 18  Next
Author Message
PostPosted: Fri Apr 25, 2014 6:17 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
chitselb wrote:
I was a little bummed when I got CREATE working and couldn't do something like
: {spadecharacter} ;
because the graphics have the bit7 set and that makes them look like the final character of the word name.

I'm not forgetting that you said Tali Forth was STC, but I pulled this from my ITC '02 Forth in case it helps any:
Code:
\ L>NAME below is a replacement for the standard L>NAME .  This one works with
\ words that have special characters in them with values above 7F.  Certain
\ combinations of character values and name lengths could still fool it, but most
\ will work.  It is made to only loop enough times to do a maximum-length
\ (31-character) name.  Depending on whether the number of characters in what
\ appears to be the name field is even or odd, often 0-length "names" won't be
\ considered.  The max limit is there to save the day when the input is not the
\ first address after a name field (as with a literal, or a runtime which is not
\ really a Forth word per se and has no header).


: L>NAME   ( LFA -- NFA )
   22 2
   DO                              \ ^ ADR
      DUP   I - C@ DUP  80 AND     \ ^ ADR  CHR  MSB_SET?
      IF                1F AND     \ ^ ADR  APARENT_LEN
                        I 1- =     \ ^ ADR  LEN_AGREE?
         IF I - LEAVE THEN         \ ^ NFA \ Change LFA to NFA if length agrees.
      ELSE      DROP               \ ^ ADR
      THEN
   LOOP         ;

(Five lines up from the bottom it looks like "IF AND" but it's "one-eff", not "IF". The 22 and 80 are also hex, as I normally leave it in hex, not decimal.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 25, 2014 9:36 pm 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
Garth,

You might have me confused with scotws who is doing Tali Forth on a 65R2D216 or some such. I was just riffing on his dictionary diagram because the idea of headerless code has some merit, to my Forth, which is PETTIL on a stock 6502 for the Commodore Pet.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 26, 2014 7:18 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
If the symbol table consisted only of two fields,
    2-byte CFA of the compiled code
    Name field (length/flags byte, and the name itself)

Then there would never be any need to relocate anything! The entire symbol table could be CMOVEd elsewhere at will. It could even float above the dictionary (plus a suitable allocation for PAD.) HERE/PAD would start off being 6 pages (1.5K) away from the symbols, but as code gets added, the dictionary creeps closer and closer. When PAD gets within some arbitrary limit, maybe 3 pages away, the symbols would (magically) move back up to be 6 pages away from PAD again. The symbol table would become like a balloon on a string, and when compilation finishes, it could just float away. A few user variables would keep track of how close the block buffers are to the symbols, where the symbols are right now, what are the minimum and maximum lengths of the balloon string, and where is the top of the dictionary (HERE).

One of my favorite things about Forth is its "no bozos" approach to the development environment. Working in Forth can be pretty crashy at times. It's raw, unforgiving, and puts the responsibility for knowing what is going on squarely on the programmer. I am of course defending in advance against attacks on what a house of cards this setup appears to be.

The memory on a 32K PET could look something like this when PETTIL first loads:
$0000 zero page
$0100 stack page
$0200 cassette buffers and other crap
$0400 BASIC ("10 sys 1037" to kick things off)
$0500 Sweet16
$06BE Sweet16 extensions
$06ED user variables
$07E3 target-compiled PETTIL dictionary, just code (no name fields, no link fields)
$1486 HERE
$14D6 PAD
$1AD6 start of symbol table
$1AD6 CFA of EXECUTE
$1AD8 7, "EXECUTE"
$1AE0 CFA of I
$1AE2 1, "I"

... unused memory ...

$7A10 0000 (flag - no more blocks)
$7A12 BLOCK 2 data, also compressed
$7C61 start address of 2 BLOCK (say, $7A12)
$7C63 BLOCK 1 data, RLE-compressed
$7FFE start address of 1 BLOCK (say, $7C63)
-----
$8000 video memory (the screen)
$B000-$E7FF BASIC & Editor ROM
$E810 PIA1
$E820 PIA2
$E840 VIA
$F000 Kernel ROM
$FFFF end of address space


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 06, 2015 12:46 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Just FYI, Tali Forth is now hosted on GitHub: https://github.com/scotws/TaliForth


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 19, 2015 6:09 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
For those working on their own STC Forths, present or future, here is a routine I have found myself using over and over again to store JSR/JMP/words in Forth words. Use by calling the routine with the address/word after it:
Code:
       jsr f_cmpljsr
       .word <addr>
This costs an additional 12 clock ticks (JSR/RTS), but the space savings really add up. I'm still on the fence about a "compile JSR and TOS" routine, I don't think I'm using it enough yet to justify breaking it out.

Code:
f_cmplword:     lda #$00        ; just compile word in little-endian
                bra _common

f_cmpljsr:      lda #$20        ; compile "JSR" opcode
                bra _common

f_cmpljmp:      lda #$4C        ; compile "JMP" opcode; falls through to _common

                ; opcode doubles as a non-zero flag
_common:        sta FLAG

                ; pull address/word off of stack
                pla
                sta TMPADR1     ; LSB of address
                pla
                sta TMPADR1+1   ; MSB of address

                ; increase address by one because of the way the 65c02 handles
                ; subroutine jumps: Address is the last byte of the JSR instuction
                ; not the address itself
                inc TMPADR1
                bne +
                inc TMPADR+1

*               ldy #$00

                ; see if we're just adding a word
                lda FLAG
                beq _wordonly

                ; this is either f_cmpljsr or f_cmpljmp, so we use the opcode
                ; that doubled as the flag and save that first
                sta (CP),y
                iny

                ; continue with common part: compile word
_wordonly:      lda (TMPADR1)   ; LSB
                sta (CP),y
                iny

                inc TMPADR1
                bne +
                inc TMPADR+1

*               lda (TMPADR1)   ; MSB
                sta (CP),y
                iny

                ; save new CP
                tya
                clc
                adc CP
                sta CP
                bcc +
                inc CP+1

                ; restore the correct return address. We have already added
                ; two to the return address, so we just need to push it on
                ; the stack
*               lda TMPADR1+1   ; MSB
                pha
                lda TMPADR1     ; LSB
                pha

                rts
Oh, and today is the one-year anniversary of Tali development. Yay!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 19, 2015 2:59 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Nice subroutine, Scot. And happy anniversary, btw! :) Here are a couple of tweaks to consider -- nothing earthshaking but it's always nice to have a choice of options. This:
Code:
                ; pull address/word off of stack
                ; increase address by one because of the way the 65c02 handles
                ; subroutine jumps: Address is the last byte of the JSR instuction
                ; not the address itself

                pla             ; LSB of address
                sta TMPADR1
                pla             ; MSB of address
                sta TMPADR1+1
                inc TMPADR1
                bne +
                inc TMPADR+1
*
... can be smaller and faster, as follows. Notice you don't need an '816 to have the benefit of doing 16-bit operations on-chip!
Code:
                ply             ; LSB of address
                pla             ; MSB of address
                iny
                bne +
                ina
*               sty TMPADR1     ; LSB of address
                sta TMPADR1+1   ; MSB of address


The other tweak is a little sneaky. It saves 4 cycles (and costs 2 bytes). This...
Code:
                lda TMPADR1+1   ; MSB
                pha
                lda TMPADR1     ; LSB
                pha

                rts
... can be replaced by this:
Code:
                inc TMPADR1
                bne +
                inc TMPADR1+1

*               jmp (TMPADR1)
I don't see much of a downside except it requires the programmer to be wide awake! :wink: Making matters worse, I didn't bother to include comments. :roll:

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 20, 2015 8:17 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Dr Jefyll wrote:
Here are a couple of tweaks to consider -- nothing earthshaking but it's always nice to have a choice of options


You'd think I'd have gotten used to PLY by now. Thanks, I'll be changing that. The second one -- neat, but at the moment I'm having a phase of being concerned with size (I'm above my original 8K limit).

Do we have a topic somewhere with "cool things you can do with the 65c02 (but not the 6502)"?

Thanks!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 21, 2015 2:54 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
scotws wrote:
... The second one -- neat, but at the moment I'm having a phase of being concerned with size (I'm above my original 8K limit) ...

I glanced at your GitHub source. You have a nice straightforward style, with a good balance of code, comments and whitespace, but I noticed several spots where you missed out on opportunities to save code bytes. Are you interested in any hints? If so, how far is too far for you (regarding grokability and structure) when it comes to space-saving tricks?

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 21, 2015 7:37 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
barrym95838 wrote:
Are you interested in any hints? If so, how far is too far for you (regarding grokability and structure) when it comes to space-saving tricks?
I'd be most grateful for any suggestions! I'm trying to put the emphasis on speed with Tali Forth (so a lot of the JSRs are "unrolled" into assembler), but I'd also like to stay under 8k because of the ROM chip size, so I'm having to balance stuff. Note I have a few more words to code (see TODO.txt) before I formally start optimization, but we're slowly getting there.

Again, please do comment on anything you see -- even if I decide not to include it, I'm sure to learn something!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 21, 2015 4:14 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
scotws wrote:
... I'm trying to put the emphasis on speed with Tali Forth (so a lot of the JSRs are "unrolled" into assembler) ...

That explains a lot of what I noticed, but not all of it. There are a few hard-core golfers around here, and I don't consider myself worthy of that label yet, but I'll give it a try when I get the chance.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 22, 2015 6:47 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
scotws wrote:
Again, please do comment on anything you see -- even if I decide not to include it, I'm sure to learn something!

Okay, I went through the first 1000 lines of your Tali-Forth.asm source, and saved you 87 bytes without any apparent side-effects, assuming that register Y is free for use by the primitives and TMPCNT and TMPADR aren't used to transmit data between primitives. Tali-Forth.asm is over 7000 lines long, so I could probably save you about 520 bytes more (by extrapolation) in that file alone. It's getting late here, and I have to get to work tomorrow, but you should get the idea of what I'm trying to do (these suggestions are untested).

In Tali-Forth.asm:

In f_toupper (saves 1 byte)
change:
Code:
                sec
                sbc #$20        ; offset to upper case

to:
Code:
                adc #'A-'a      ; offset to upper case


In f_strtoupper (saves 1 byte)
delete:
Code:
                lda 1,x         ; LSB of u, we ignore MSB
                sta TMPCNT

and change:
Code:
                ldy TMPCNT

to:
Code:
                ldy 1,x         ; LSB of u, we ignore MSB
                dey             ; we don't want to convert the byte
                                ;  following the end of our string!

In _parseword (saves 1 byte)
change:
Code:
                ldy #$00

                lda #$20        ; opcode for JSR
                sta (CP),y
                iny

to:
Code:
                lda #$20        ; opcode for JSR
                sta (CP)
                ldy #1


In _execute (saves 1 byte)
change:
Code:
                ; Forth commands end with a RTS instruction. We fake the
                ; return address by pushing the correct address to the
                ; 65c02's stack and then doing a normal JMP. When we return,
                ; we land on a NOP so we don't have to DEC the return address
                lda #>_doneexec         ; push MSB first
                pha
                lda #<_doneexec

to:
Code:
                ; Forth commands end with a RTS instruction. We fake the
                ; return address by pushing the correct address - 1 to the
                ; 65c02's stack and then doing a normal JMP. When we return,
                ; we land on the correct address due to the nature of RTS.
                lda #>(_doneexec-1)     ; push MSB first
                pha
                lda #<(_doneexec-1)
                pha

and change:
Code:
_doneexec:      ; Keep the NOP here as the landing site for the indirect
                ; subroutine jump (easier and quicker than adjusting the
                ; return address on the 65c02's stack)
                nop             

to:
Code:
_doneexec:      ; Landing site for the indirect jump



In _compile (saves 1 byte)
change:
Code:
                cmp #$01
                beq _execute

to:
Code:
                dec
                beq _execute


In f_nib2asc (saves 14 bytes)
change:
Code:
                phx
                and #$0F       
                tax
                lda hexstr,x
                jsr f_putchr
                plx

to:
Code:
                and #$0F       
                ora #'0
                cmp #'9+1
                bcc +
                adc #6
*               jmp f_putchr

and delete:
Code:
hexstr:         .byte "0123456789ABCDEF"

at the end of the file (it was just a duplicate of the beginning of alphastr anyway).

In f_putchr (saves 7 bytes)
change:
Code:
                ; PORT 0: DEFAULT, Terminal, ASCI
                jsr k_wrtchr
                bra _done

_c1:            cpy #$01
                bne _c2

                ; PORT 1: VIA Port A output
                jsr k_wrtchrVIAa
                bra _done

_c2:            cpy #$02
                bne _err

                ; PORT 2: VIA Port B output
                jsr k_wrtchrVIAb
                bra _done

_err:           lda #$08        ; string code for unknown channel
                jmp error

_done:          ply
                rts

to:
Code:
                ; PORT 0: DEFAULT, Terminal, ASCI
                ply
                jmp k_wrtchr

_c1:            dey
                bne _c2

                ; PORT 1: VIA Port A output
                ply
                jmp k_wrtchrVIAa

_c2:            dey
                bne _err

                ; PORT 2: VIA Port B output
                ply
                jmp k_wrtchrVIAb

_err:           lda #$08        ; string code for unknown channel
                jmp error


In f_prtzerostr (saves 7 bytes)
change:
Code:
f_wrtzerostr:   ; version without a final linefeed
                ldy #$00
                phy
                bra _common

f_prtzerostr:   ; version with a final linefeed
                ldy #$FF
                phy

_common:        phx             ; use X as indext to table

                asl
                tax
                lda strtbl,x
                sta TBLLOC
                inx
                lda strtbl,x
                sta TBLLOC+1

                plx
                       
                ldy #$00
*               lda (TBLLOC),y
                beq _linefeed
                jsr f_putchr
                iny
                bra -
               
_linefeed:      ; get flag to see if we print a final linefeed or not
                ply
                beq _done

                ; print a line feed
                lda #AscLF
                jsr f_putchr

_done:          rts

to:
Code:
f_prtzerostr:   ; version with a final linefeed
                ldy #$FF
                .byte $2C       ; BIT abs opcode (skip next 2-byte instr.)

f_wrtzerostr:   ; version without a final linefeed
                ldy #$00
                phy

_common:        asl
                tay             ; use Y as index to table
                lda strtbl,y
                sta TBLLOC
                iny
                lda strtbl,y
                sta TBLLOC+1

                ldy #$00
*               lda (TBLLOC),y
                beq _linefeed
                jsr f_putchr
                iny
                bra -
               
_linefeed:      ; get flag to see if we print a final linefeed or not
                lda #AscLF
                ply
                bne f_putchr    ; print a line feed

                rts


In f_getchr (saves 7 bytes)
change:
Code:
                ; PORT 0: DEFAULT, Terminal, ASCI
                jsr k_getchr
                bra _done

_c1:            cpy #$01
                bne _c2

                ; PORT 1: VIA Port A input
                jsr k_getchrVIAa
                bra _done

_c2:            cpy #$02
                bne _err

                ; PORT 2: VIA Port B input
                jsr k_getchrVIAb
                bra _done

_err:           lda #$08        ; string code for wrong channel
                jmp error

_done:          ply
                rts

to:
Code:
                ; PORT 0: DEFAULT, Terminal, ASCI
                ply
                jmp k_getchr

_c1:            dey
                bne _c2

                ; PORT 1: VIA Port A input
                ply
                jmp k_getchrVIAa

_c2:            dey
                bne _err

                ; PORT 2: VIA Port B input
                ply
                jmp k_getchrVIAb

_err:           lda #$08        ; string code for wrong channel
                jmp error


In f_cmp16 (saves 2 bytes)
change:
Code:
                ; low bytes are not equal, compare MSB
                lda 2,x         ; MSB of TOS
                sbc 4,x         ; MSB of NOS
                ora #$01        ; Make Zero Flag 0 because we're not equal
                bvs _overflow
                bra _done

_equal:         ; low bytes are equal, so we compare high bytes
                lda 2,x         ; MSB of TOS
                sbc 4,x         ; MSB of NOS
                bvc _done

_overflow:      ; handle overflow because we use signed numbers
                eor #$80        ; complement negative flag
                ora #$01        ; if overflow, we can't be equal

_done:          rts

to:
Code:
                ; low bytes are not equal, compare MSB
                lda 2,x         ; MSB of TOS
                sbc 4,x         ; MSB of NOS
                bvs _overflow
                bra _notequal

_equal:         ; low bytes are equal, so we compare high bytes
                lda 2,x         ; MSB of TOS
                sbc 4,x         ; MSB of NOS
                bvc _done

_overflow:      ; handle overflow for signed numbers
                eor #$80        ; complement negative flag
_notequal:
                ora #$01        ; clear zero flag

_done:          rts


In fc_dovar (saves 10 bytes)
change:
Code:
fc_dovar:       ; pull return address off of the machine's stack
                pla             ; LSB of return address
                sta TMPADR2
                pla             ; MSB of return address
                sta TMPADR2+1

                ; The address is one byte below this
                inc TMPADR2
                bne +
                inc TMPADR2+1

                ; get variable and push it on the stack
*               dex
                dex
                lda TMPADR2     ; LSB
                sta 1,x
                lda TMPADR2+1   ; MSB
                sta 2,x

to:
Code:
fc_dovar:       ; pull return address off of the machine's stack,
                ;  add 1 to make up for JSR idiosyncrasy, and push
                ;  it on the forth stack
                dex
                dex
                pla             ; LSB of return address
                ply             ; MSB of return address
                inc             ; add 1 to make it correct
                sta 1,x
                bne +
                iny
 +              sty 2,x


In a_plit (saves 4 bytes)
change:
Code:
                ; move up one address so we are pointing to the byte after
                ; the JSR command.
                inc TMPADR
                bne +
                inc TMPADR+1

*               ; get bytes after JSR address
                lda (TMPADR)    ; LSB
                sta 1,x
                inc TMPADR
                bne +
                inc TMPADR+1
*               lda (TMPADR)    ; LSB
                sta 2,x

                ; replace the new address on the stack
                lda TMPADR+1
                pha
                lda TMPADR
                pha

to:
Code:
                ldy #1

                ; get bytes after JSR address
                lda (TMPADR),y  ; LSB
                sta 1,x
                iny
                lda (TMPADR),y  ; MSB
                sta 2,x

                ; replace the new address on the stack
                tya
                clc
                adc TMPADR
                tay
                lda TMPADR+1
                adc #0
                pha
                phy


In a_lit (saves 2 bytes)
change:
Code:
a_lit:          ldy #$00

                ; we first compile the call to (LITERAL)
                lda #$20        ; opcode for the JSR instruction
                sta (CP),y
                iny
                lda #<l_plit   
                sta (CP),y
                iny
                lda #>l_plit
                sta (CP),y
                iny

                ; bookkeeping: update CP
                tya
                clc
                adc CP

to:
Code:
a_lit:          ldy #$01

                ; we first compile the call to (LITERAL)
                lda #$20        ; opcode for the JSR instruction
                sta (CP)
                lda #<l_plit   
                sta (CP),y
                iny
                lda #>l_plit
                sta (CP),y

                ; bookkeeping: update CP
                tya
                sec
                adc CP


In a_quit (saves 2 bytes)
change:
Code:
a_quit:         ; Reset the return stack (65c02 stack) pointer
                stx TMPX       
                ldx #RP0
                txs
                ldx TMPX

to:
Code:
a_quit:         ; Reset the return stack (65c02 stack) pointer
                txa       
                ldx #RP0
                txs
                tax


In a_dump (saves 27 bytes)
change:
Code:
a_dump:         ; if we were given zero bytes to display, abort the whole
                ; thing
                lda 1,x
                ora 2,x
                beq _done

                ; start a new line
                jsr l_cr

                ; put stack parameters where we can work with them
                lda 1,x
                sta TMPCNT      ; this is the counter LSB
                lda 2,x
                sta TMPCNT+1    ; MSB

                lda 3,x
                sta TMPADR      ; LSB
                lda 4,x
                sta TMPADR+1    ; MSB

                ; start internal counter so we only display 16 numbers
                ; per row
                ldy #$00

_loop:          ; dump the contents
                lda (TMPADR)

                jsr f_byte2hexasc
                jsr l_space

                iny
                cpy #$10
                bne _nextchar

                ; start next line
                jsr l_cr
                ldy #$00

_nextchar:      ; next char
                inc TMPADR
                bne _counter
                inc TMPADR+1

_counter:       ; loop counter
                lda TMPCNT
                bne +
                dec TMPCNT+1
*               dec TMPCNT

                ; loop control
                lda TMPCNT
                ora TMPCNT+1
                bne _loop

to:
Code:
a_dump:         ; start internal counter for 16 numbers per row
                jsr l_cr
                ldy #16

_loop:          ; if there are zero bytes left to display, we're done
                lda 1,x
                ora 2,x
                beq _done

                ; dump the contents
                lda (3,x)

                jsr f_byte2hexasc
                jsr l_space

_nextchar:      ; next char
                inc 3,x
                bne _counter
                inc 4,x

_counter:       ; loop counter
                lda 1,x
                bne +
                dec 2,x
*               dec 1,x

                dey
                bne _loop
                bra a_dump


HTH,

Mike B.

[Edit: Made a few minor corrections.]
[Edit #2: I totally messed up a_plit ... I'll fix it later ... a_dump sure kicks butt, though, don't it?]
[Edit #3: I fixed a_plit ... I think.]
[Edit #4: ina -> inc , dea -> dec]


Last edited by barrym95838 on Mon Jan 26, 2015 1:40 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 22, 2015 10:37 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Wow! Thank you for all the work!

It turns out that the _doneexec change doesn't work that way with this assembler; I'll have to go back and figure that out differently, though, that is butt-ugly. Converting
Code:
cmp#$01 --> dec
before a BEQ is a trick I wasn't aware of. The same goes for using
Code:
.byte $2C
to mask the next two bytes (BIT abs) either, but in this case I'm going to use another BRA instead because that's three cycles and BIT is four. Still, cool. Might be worth an assembler macro as SKIP. l_plit and l_lit I had already rewritten (now that I've discovered f_cmpljsr), so that's fine for the moment.

And yes, DUMP is pretty brilliant :D .

So it seems I need to make more use of Y and let the assembler handle more stuff. See, told you I'd learn a lot.

Thanks again -- code modified and pushed to master.

Y, Scot


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 23, 2015 12:44 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I ran across the following article explaining the rationale for the $2C trick the Mike B. was recommending. The article is well written and provides a clear explanation why the programming trick of using BIT abs to skip over two bytes.

Since the 65C02 supports an unconditional branch, the benefit of the trick is not as pronounced. However, using BRA rel instead on BIT abs may save a clock cycle but it requires an extra byte.

As I was writing this post, I thought of a similar trick that may be used on a 65C02. There are a number of unimplemented instructions that operate as NOPs in the instruction set of a true 65C02 but that also advance the program counter past 1 or 2 arguments.

hoglet's recent post regarding Alan Daly's R65Cx2 core identifies a number of opcodes that operate as NOPs but also skip two bytes: $5C, $DC, $FC. Will need to test this behavior with the processor that you are using, but Klaus Dormann's 65C02 functional tests expects the behavior. Thus, I expect a 65C02 processor perform a NOP and then skip the next two bytes if any one of these three opcodes is used instead of the $2C opcode Mike B. suggested.

Therefore, one of these three opcodes will save a byte and have the same execution time as BRA rel.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 23, 2015 2:12 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
scotws wrote:
using
Code:
.byte $2C
to mask the next two bytes

This is a cool way to skip two bytes, no question. $2C (or similar) is used as an opcode, and the following two bytes are seen by the CPU as the accompanying operand. (In reality there's an instruction stashed in there, but it doesn't execute. In context of the preceding $2C it will be treated as an operand.) This trick is one cycle slower than a BRA but it's one byte shorter, thus conserving memory. It alters the flags, but that drawback can usually be tolerated.

A more serious drawback is that this trick can cause read-sensitive I/O registers (such as the 6522's Interrupt Flag Register) to suffer an erroneous change of state. For example, the sequence $2C, $12, $34 (BIT absolute $3412) -- where $12, $34 is the two bytes of code you hope to skip -- really does read from $3412 (or whatever). If that read touches a 6522 IFR, or any other read-sensitive I/O address, the result is a very nasty and hard-to-find bug. :!: Because the bug is unlikely, lots of folks just ignore the risk. But it's usually pretty easy to assure yourself on the matter.

  • you'll be OK if the target system does not include IO devices with read-sensitive registers (eg: Interrupt Flag Register)
  • failing that, you'll be OK if the feature associated with the read-sensitive register is unused (eg: not using interrupts)
  • failing that, you'll be OK if the bytes after the $2C, interpreted as an address, do not touch the read-sensitive register(s).

$2C (BIT absolute) is not the only option. Alternatives include $CD (CMP absolute), $EC (CPX absolute) and $CC (CPY absolute), but it all amounts to the same thing. All of these instructions use a two-bye operand, and none of them will modify CPU registers -- flags excepted.

MichaelM wrote:
opcodes that operate as NOPs but also skip two bytes: $5C, $DC, $FC
Yes. On 65c02, $DC and $FC can be used for the skip-two-bytes trick. Unlike $2C (BIT absolute), they have the advantage of leaving the flags unaltered. :) But, unfortunately, the same warning applies. Although the 65c02 bills them as NOPs, opcodes $DC and $FC do produce a memory reference -- a read -- using absolute addressing mode (just as $2C does). They qualify as NOPs only because the result of the read is discarded. I describe the "Load And Discard" operation in more detail on my web site here.

BTW $5C could perhaps be used for the skip-two-bytes trick, but I see no advantage. And $5C is much slower.

cheers,
Jeff

ETA: On the subject of skipping one byte:

On the 'C02, opcodes $02, $22, $42, $62, $82, $C2, and $E2 all equate to "Load And Discard" but using immediate address mode. Unlike the absolute mode examples above, no extra memory reference is generated, so the caveat doesn't apply -- you may use these freely to skip over one byte of code. For 'C02 you can choose any of these opcodes; they all behave identically. Interestingly, on '816 $42 still has the same behavior, although it's been dubbed "WDM." But the skip one byte trick still works on '816 as long as you choose $42. :)

Formal documentation re the 'C02 NOPs is extremely scarce. The WDC datasheet lists the bytes and cycles but nothing more. In the excerpt below I highlighted the NOPs mentioned in this post.


Attachments:
65C02 NOPs.gif
65C02 NOPs.gif [ 9.73 KiB | Viewed 12424 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Thu May 05, 2016 4:53 pm, edited 4 times in total.
Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 24, 2015 2:57 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
As usual, Dr Jefyll has some excellent observations.

Since I implement all unused 65C02 opcodes for my 65C02-compatible processor cores as single cycle NOPs, it did not occur to me that the implied cycle to read/write a memory operand would actually be performed. This behavior is not likely to be visible to Klaus' 65C02 functional tests since any effects would be discarded. (Note: my processor core can support these behaviors with some minor updates to the microprogram, but I would not have implemented any read/write memory cycles; it just would not have occurred to me to do so because of the side effects that Jeff describes.)

I have not dug into hoglet's R65Cx2 core to see if it would perform the implied memory cycle; it certainly appears to perform no operation on the CPU registers as would be expected.

_________________
Michael A.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 266 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 18  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: