6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Mar 28, 2024 11:57 pm

All times are UTC




Post new topic Reply to topic  [ 72 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Fri Mar 21, 2014 6:45 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
phooey. Charlie-brain-damage admission.

I would have to explicitly clear carry to pull this off, making it the same 15 clocks. Or have a contract with every primitive to ensure carry flag is clear before jmp next.

Nevermind!


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 21, 2014 7:54 am 
Online
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3327
Location: Ontario, Canada
chitselb wrote:
Or have a contract with every primitive to ensure carry flag is clear before jmp next.
I don't see a problem with that. At worst you break even. What you're doing looks good to me. :)

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 21, 2014 12:30 pm 
Offline

Joined: Tue Jan 07, 2014 8:40 am
Posts: 91
chitselb wrote:
I'm using a design where the compiler
  • realigns at page boundaries to work around the jmp ($xxff) bug
  • inserts a call to the word "page" when it compiles definitions (where they cross page boundaries.)
All page does when it executes is 'inc ip+1', to cross the page. That unburdens NEXT considerably, at the expense of compiler complexity


Very clever. I like it! And it adds relatively little to compiler complexity.

E.g., in CamelForth all threads are compiled through a word ,XT (append eXecution Token), which turns out to be equivalent to the ANS word COMPILE, . I would merely change ,XT to test for HERE=$xxFE, and compile the call to "page" in that case. (I assume that you're keeping the threads 16-bit aligned.)

There will be some added complexity for words that fetch an in-line value, such as literal, ." , and the branch operators. They'll need to be smart about page boundaries (or simply use a 16-bit increment for IP).

_________________
Because there are never enough Forth implementations: http://www.camelforth.com


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 21, 2014 3:24 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 669
Since I'm looking at my AcheronVM again, I noticed its dispatch was pretty quick, all things considered. Could be useful for a token-threaded Forth, I guess:

Code:
mainLoop1:     ; Main loop entry when an instruction consumed an operand byte, so bump instruction pointer an extra 1
 iny
 bmi _iptrOverflow

mainLoop0:     ; Main loop entry when an instruction had no operand bytes
 lda (iptr),y
 iny
 sta *+4
 jmp (dispatchTable)  ; low byte is self-modded


All of the tokens are an even number, so no ASL is required to index into the 16-bit dispatch table. 128 possible tokens in this encoding.

The current program counter is iptr + Y. Instruction implementations must preserve .Y, and there is another mainLoopY entry point which does "ldy saveY" for convenience, given that the implementation stashed Y there before trampling it. The code regularly but not exhaustively checks to see if Y is greater than 127 and coalesces the value back into iptr if it reaches that range. It's nice that instructions can freely grab operand bytes using iny without having to always perform this check (though obviously wildly degenerate cases will break), and it doesn't break anything when page boundaries are crossed. The only issue is that the lda (iptr),y instructions take another cycle in those cases. Any branch, jump, etc, sets the new iptr and resets Y to zero.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 27, 2014 7:10 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1918
Location: Sacramento, CA, USA
I never get tired of seeing all of the cool ways (zp),y gets used ... it's one of the most important keys to the 6502's power and versatility, IMO.

Mike


Top
 Profile  
Reply with quote  
PostPosted: Fri May 08, 2015 3:00 am 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
This was a huge win today! I replaced 27 bytes of EXECUTE code with just 11, eliminating the need for a "W" register (as is typical of Indirect Threaded implementations)
Code:
execute
    lda tos+1
    pha
    lda tos
    pha
    jsr slide      ; identical to DROP but ends in RTS instead of NEXT
    php
    rti

for comparison, here's the old dog chow:
;--------------------------------------------------------------
; W register, used by EXECUTE
;w1
;    .word $dead             ; (for when you just need a W register)
;    .word exit              ; 'fragment secondary' used by EXECUTE

;--------------------------------------------------------------
#if 0
name=EXECUTE
tags=inner
stack=( cfa -- )
Executes the word whose code field address is on the stack.

#endif
;execute
;    lda tos                 ; <-- code field address
;    sta w1                  ; in direct-threaded models, this
;    lda tos+1               ; contains code instead of a pointer
;    sta w1+1                ; [SP] -> [W1]
;    lda ip+1
;    pha
;    lda ip
;    pha
;    lda #<(w1-2)
;    sta ip
;    lda #>(w1-2)
;    sta ip+1
;    jmp pops


Top
 Profile  
Reply with quote  
PostPosted: Fri May 08, 2015 3:32 am 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8412
Location: Southern California
What I'm running for 65c02 is:
Code:
        LDA  0,X
        STA  W
        LDA  1,X
        STA  W+1
        INX
        INX
        JMP  W-1

My '816 ITC has basically the same thing:
Code:
         HEADER "EXECUTE", NOT_IMMEDIATE     ; ( addr -- )
EXECUTE: PRIMITIVE
         LDA  0,X
 xeq1:   STA  W
         INX_INX
         JMP  W-1
 ;-------------------

and the label is for PERFORM:
Code:
         HEADER "PERFORM", NOT_IMMEDIATE      ; ( addr -- )
PERFORM: PRIMITIVE                ; same as  @ EXECUTE
         LDA  (0,X)
         BRA  xeq1
 ;-------------------

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri May 08, 2015 4:41 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1918
Location: Sacramento, CA, USA
65m32:
Code:
                 ;  154 ;--------------------------------------------------------------- EXECUTE
0000029e:00000289;  155         NOT_IMM 'EXECUTE'
0000029f:07455845;  155
000002a0:43555445;  155
                 ;  156  _execute: ; ( xt --  ) \ Execute Forth word 'xt'
000002a1:62020000;  157         pda  ,x+        ; Pop xt from S: and push it to R:
000002a2:5e0c0000;  158         rts             ; Pop xt from R: and jump to it

This is an example of a disadvantage of keeping TOS-in-A. If I kept it at 0,x then EXECUTE would be only a single machine instruction: jmp (,x+).

Mike B.

php rti is a neat hack, Charlie!


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 24, 2016 6:54 am 
Offline
User avatar

Joined: Fri Oct 17, 2014 9:27 pm
Posts: 2
So, why not simply use Y to hold the LSB and store it into the zero page JMP command. This will get 12 cycles on a 6502 approaching subroutine-threading performance without clobbering the C flag (allowing multi-precision arithmetic).

Code:
PAGE    INC IP+2      - seldom executed
NEXT    INY          ; 2 cycles
        INY          ; 2 cycles
        STY IP+1     ; 3 cycles
IP      JMP ($0000)  ; 5 cycles

It might seem terrible to loose both the X and Y registers, but the Y register can be nuked and then reloaded with LDY IP+1 which only adds three cycles to whichever primitive is being writing.

Code:
CLIT    LDY #0       ; 2 cycles
        LDA (IP+1),Y ; 5 cycles
        DEX          ; 2 cycles
        STA (0,X)    ; 6 cycles
        LDY IP+1     ; 3 cycles
        INY          ; 2 cycles
        JMP NEXT     ; 3 cycles

_________________
-- Zomg! Pewpew!!


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 25, 2016 6:43 am 
Online
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3327
Location: Ontario, Canada
nonarkitten wrote:
Code:
PAGE    INC IP+2      - seldom executed
NEXT    INY          ; 2 cycles
        INY          ; 2 cycles
        STY IP+1     ; 3 cycles
IP      JMP ($0000)  ; 5 cycles
This is pretty cool. Fast -- I like it.

Minor added detail: 5 is correct clock count for the JMP (ind) "featured" in the NMOS 6502, but the 65C02's bug-fixed JMP (ind) increases that count to 6. (Until now I didn't realize the '816 reduces it back to 5.)

Quote:
the Y register can be nuked and then reloaded with LDY IP+1 which only adds three cycles
Good point. In fact, maybe the Y reload is a better candidate for occupying the fallthrough position preceding NEXT.
IOW, replace this
Code:
PAGE    INC IP+2      - seldom executed  <-------
NEXT    INY          ; 2 cycles
        INY          ; 2 cycles
        STY IP+1     ; 3 cycles
IP      JMP ($0000)  ; 5 cycles
with this
Code:
FIX_Y   LDY IP+1     ;reload bombed Y <-------
NEXT    INY          ; 2 cycles
        INY          ; 2 cycles
        STY IP+1     ; 3 cycles
IP      JMP ($0000)  ; 5 cycles
(There could still be a PAGE label. But after the INC IP+2 it would JMP or BRA to NEXT rather than falling through.)

-- Jeff

PS- Shouldn't this be STA 0,X? Also I'd expect CLIT also to write zero to the highbyte of TOS at 1,X. :)
Code:
CLIT    LDY #0       ; 2 cycles
        LDA (IP+1),Y ; 5 cycles
        DEX          ; 2 cycles
        STA (0,X)    ; 6 cycles  <-------
        LDY IP+1     ; 3 cycles
        INY          ; 2 cycles
        JMP NEXT     ; 3 cycles

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 25, 2016 7:59 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8114
Location: Midwestern USA
Dr Jefyll wrote:
Minor added detail: 5 is correct clock count for the JMP (ind) "featured" in the NMOS 6502, but the 65C02's bug-fixed JMP (ind) increases that count to 6. (Until now I didn't realize the '816 reduces it back to 5.)

You can thank the 16 bit ALU in the '816 for that. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 12, 2016 3:43 pm 
Offline

Joined: Sat Aug 21, 2010 7:52 am
Posts: 231
Location: Arlington VA
nonarkitten wrote:
So, why not simply use Y to hold the LSB and store it into the zero page JMP command. This will get 12 cycles on a 6502 approaching subroutine-threading performance without clobbering the C flag (allowing multi-precision arithmetic).

It creates a contract that each primitive must return Y = IP[low], and register contracts are scary. Primitives that modify the Y register have to give the 3 cycles back with the LDY IP+1, which also adds two bytes to all such primitives. But if I were to go this route, I'd probably do it like so, (increases the size of the zero page code from 7 to 9 bytes):
Code:
;this section lives in zero page
NEXTY   LDY IP+1     ;[3] restore instruction pointer
                     ;    (unnecessary if primitive didn't alter Y)
NEXT    INY          ;[2]
        INY          ;[2]
NEXTB   STY IP+1     ;[3]
IP      JMP ($CAFE)  ;[5]

;this part is in high memory
PAGE    INC IP+2     ;[5]
        LDY #0       ;[2]
        JMP NEXTB    ;[3]
 
CLIT    DEX          ;[2]
        LDY #0       ;[2]
        LDA (IP+1),Y ;[5]
        STY STACKH,X ;[4]
        STA STACKL,X ;[4] (I'm using a split stack)
        INC IP+1     ;[5] get past the literal byte
        JMP NEXTY    ;[3]


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 20, 2016 8:54 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
The Fig-Forth already has such a contract with X, as it's the stack pointer.


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Sat Nov 13, 2021 3:12 am 
Offline

Joined: Fri May 05, 2017 9:27 pm
Posts: 841
GARTHWILSON wrote:
FIG-Forth's NEXT leaves Y=0 and then some of the words take advantage of that so they don't have to zero it first. I had considered making NEXT leave C unaffected to make it easier to do multi-precision arithmetic in secondaries.


If I were interested in multi-precision math in secondaries, that is not how I would do it.
Making NEXT preserve the state of the carry flag would burden all Forth words with the extra overhead. This is a quick rough draft of what I might try.
Code:
VARIABLE CARRY
CODE (MP+)  ( N1 N2 -- N3 )  \ multi-precision plus
   CARRY ROR
   0 ,X LDA  2 ,X ADC  2 ,X STA
   1 ,X LDA  3 ,X ADC  3 ,X STA
   CARRY ROL
   POP JMP  END-CODE
CODE +  ( N1 N2 -- N3 )  \ traditional plus
   CLC
   ' (MP+) @ 3 + JMP
   END-CODE

Or if there are two consecutive zero page locations available.
Code:
<ZP-LOCATION> CONSTANT CARRY
CODE (MP+)  ( N1 N2 -- N3 )  \ multi-precision plus
   CARRY ROR
   0 ,X LDA  2 ,X ADC  2 ,X STA
   1 ,X LDA  3 ,X ADC  3 ,X STA
   CARRY ROL
   POP JMP  END-CODE
CODE +
   CLC
   ' (MP+) @ 2+ JMP
   END-CODE

The code only uses one byte of the variable CARRY , but this is clear carry and set carry for multiple precision high level math.
Code:
CARRY OFF  \ clear carry
CARRY ON   \ set carry

A hypothetical example of multi-precision addition in high level Forth using (MP+).
Code:
CREATE VALUE1 20 ALLOT
CREATE VALUE2 20 ALLOT
\ add 20 byte (10 cell) numbers in VALUE1 and VALUE2
\ return result in VALUE1
: MP+  ( ADR1 ADR2 #CELLS -- )
   CARRY OFF 2* 0
   ?DO
      OVER I + @  OVER I + @  (MP+)
      OVER I + !
      2
   +LOOP
   2DROP ;
VALUE2 VALUE1 10 MP+



Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 13, 2023 6:40 pm 
Offline

Joined: Mon Jan 09, 2023 9:33 pm
Posts: 23
;
; using Minimal Indirect Thread Code
; https://github.com/agsb/immu/blob/main/ ... h%20en.pdf
;
; in classic ITC, the inner interpreter code always jump and
; DOCOL does a push to return stack and SEMIS does pull from return stack
;
; in minimal ITC, the code only jumps when first reference is 0x0000,
; that marks all primitives, else it don't jump, just direct does nest,
; to push the reference into return stack, and all words ends to unnest,
; to pull the reference from return stack.
;
; Because there is few primitives than compound words,
; that does a option for inner interpreter by:
; shorten all compounds words one cell
; (does not need begin with DOCOL),
; no dependence of IP or W to hold next reference
; ( it is keeped at return stack ),
; it easy for MCus or CPUs, with separate code and data memory
; ( compounds could stay in 'data' non-executable segment,
; only inner address interpreter and primitives in 'code' executable segment ),
; on RiscV ISA does easy fast inner code,
;
;
; First version for 6502, no optimizations
;
; r0, top of return stack, y indexed, any memory page
; p0, top of parameter stack, x index, any memory page
; tos, nos, pseudo registers at page zero
; wrk, ptr, pseudo registers at page zero
; a_save, x_save, y_save, keep values
; no use of A, Y as TOS, just as acumulator
;

unnest: ; aka semis, pull from return stack
lda r0 + 0, y
sta ptr + 0
lda r0 + 1, y
sta ptr + 1
iny
iny
; jmp next

next:
; as is, classic ITC from fig-forth 6502
sty y_save
ldy #0
lda (ptr), y
sta wrk + 0
iny
lda (ptr), y
sta wrk + 1
ldy y_save
; jmp refer

refer:
; as is, classic ITC from fig-forth 6502
; pointer to next reference
clc
lda ptr + 0
adc #CELL_SIZE
sta ptr + 0
bne @end
inc ptr + 1
@end:
; jmp leaf

leaf:
; in MICT, all leafs start with NULL
; in 6502, none code at page zero
; just compare high byte
lda wrk + 1
bne nest
; none forth word at page zero
jmp (ptr)

nest: ; aka docol push into return stack
dey
dey
lda ptr + 1
sta r0 + 1, y
lda ptr + 0
sta r0 + 0, y
; jmp link

link:
; next reference
lda wrk + 0
sta ptr + 0
lda wrk + 1
sta ptr + 1
jmp next

sorry long post.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 72 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: