various PETTIL design considerations

Topics relating to various Forth models on the 6502, 65816, and related microprocessors and microcontrollers.
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

I'm building PETTIL with xa65 on Ubuntu, as I don't have metacompilation yet. There's this '?:' operator that usually squeezes a few bytes and I'm fond of using it. Usage:

Code: Select all

: foo   ( -- )
   blah blah ( flag ) ?: whentrue whenfalse  blah blah ;
The run time end of this, `(?:)` occasionally winds up very near the top of a page, and when it adds 6 to IP (for NEXT to get past it and the two choices), if the caller that called `(?:)` is also very near the top of a page, the page pointer in NEXT gets incremented twice! That's not very good, because now IP is 256 bytes away from where it should point to.

When that happens, I relocate a few things to get away from the page boundary, build PETTIL, and things work again. I'm considering two solutions:

1. `?:` compiles a NOP word in front of `(?:)` when it's at that magic address

2. have ENTER increment IP+2 instead of having EXIT do it.

Is #2 kosher, even in a DTC Forth? I've never seen it done, it would involve some other code changes as well, and I'm not too sure if I would bag myself in some unforseen way.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: various PETTIL design considerations

Post by BigEd »

As a general comment, I'd say add the NOP. It's a local and simple solution with very little cost. (That said, I haven't tried to understand where the double-increment comes from.)
whartung
Posts: 1004
Joined: 13 Dec 2003

Re: various PETTIL design considerations

Post by whartung »

Does DTC use indirect jumps? Or is this a different problem?
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: various PETTIL design considerations

Post by barrym95838 »

AFAIK, it's a different problem. I believe PETTIL's super-tight super-fast NEXT is the culprit, in that it depends on external mechanisms to increment the high byte of IP, and these external mechanisms can occasionally misfire.

P.S. I have been following Charlie's progress for the last few years, and his dedication and skill in squeezing every last drop of performance out of the NMOS instruction set is quite remarkable. Sometimes squeezing too hard has consequences, though ...
Last edited by barrym95838 on Tue Dec 04, 2018 4:20 pm, edited 1 time in total.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

BigEd wrote:
As a general comment, I'd say add the NOP. It's a local and simple solution with very little cost. (That said, I haven't tried to understand where the double-increment comes from.)

Code: Select all

*=$0086
         next:
E6 8B       inc nexto+1
E6 8B       inc nexto+1
         nexto:
6C A8 66    jmp (ip)
   ^^ ^^ <-- self modifying code with indirect jmp, the special sauce
Only 15 clocks, but 'the ugly has to go somewhere.' rule must be obeyed. The responsibility for crossing page boundaries (at compile time) lies with the compiler.
  • For straight line code, the compiler inserts a `page` word into the code stream when DP reaches the end of the page. This bumps the high byte of IP and jumps to `next`
  • For skipping past inline arguments in primitives, a `jmp pad` exit adds some constant number to IP, adjusting IP page when needed, before proceeding to `next`.
  • There's a subroutine `padjust` that does the same thing, but returns to the caller.
As I recall from the bug hunt for this, it was midnight and it showed up unexpectedly, and then it was tricky to reproduce and even trickier to figure out what was going on. First `(?:)` would call `padjust` which would notice the page crossing and increment IP page, leaving the low byte of IP as $xxFE, Then `next` will add two more to that, and we are in the right place.

When the next word following this activity is `exit`it will also notice that we're at $xxFE and bump the page number. Often that was exactly what I wanted. But not twice. That was when I had the eureka moment about inverting the roles of `enter` and `exit`

Code: Select all

      old exit:
A9 02       lda #$02
18          clc
85 8B       sta ip
68          pla
65 8B       adc ip
A8          tay
68          pla
69 00       adc #$00
85 8C       sta ip+1
84 8B       sty ip
4C 8A 00    jmp nexto

      new exit:
68          pla
85 8B       sta ip
68          pla
85 8C       sta ip+1
4C 00 8A    jmp nexto
and this also necessitates fixing `enter` to position IP prior to pushing it to the return stack. And having a look at any code that reads or writes IP.
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

I'm trying to automate a build process that will generate several target versions of PETTIL (e.g. PET Upgrade ROM, PET 4.0 ROM, 80-column PET, VIC-20 +24K, VIC-20 16K cart, C=64, C=64 cart... ) with different upper dictionary addresses so it can run standalone or coexist peacefully with either Micromon or Supermon. I've figured out how to get things to autostart and stuff enough into the keyboard buffer that I can get a sort of autoexec.bat capability, enough to load and run some screens of code. This will lead to automated regression tests. Homebrew package management is fun. not.

I should add that since I haven't needed to do it yet, I haven't tested out `?:` yet. This will hopefully do the job.

Code: Select all

: ?:   ( "name1" "name2" == ; flag -- )
    ?comp  $F8 ?page [compile] (?:) ' , ' , ; immediate

20 04 11          jsr enter
xx xx             ' ?comp
xx xx F8          ' (?page) $F8   <-- inline argument follows the CFA
xx xx xx xx xx xx ' (?:) ' name1 ' name2
xx xx [here]      ' exit
where `(?page)` is a word which here checks DP.low vs. $F8 and bumps things a bit when they match, to negotiate the page boundary
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

As I start work on the metacompiler, I'm thinking of adding three additional `number` punctuation prefixes: " & '

Code: Select all

" quoted_character"                e.g. "P" puts $0050 on the stack
# decimal_value                 \
$ hex_value                      \ these are working
% binary_value                   / already in `number`
& resolve_reference                    resolves reference
' labeled_forward_reference        forward reference
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

Original way, also FIG & Blazin' way, seems like how everybody does it way

Code: Select all

ENTER pushes the current IP to the return stack
EXIT adds 2
I'm changing it to this:

Code: Select all

ENTER pushes the +2 address where we will wind up, e.g. IP := IP+2 -(IP==$FF), with page crossing considered
EXIT simply pops the return stack to IP, without modification
The reason I am doing this is NOT because I felt like redesigning the inner interpreter for fun. It's because i added a word `?:` to PETTIL. The word `?:` compiles 6 bytes, and if DP is at xxFF then `?:` will also ALLOT a byte. PETTIL does this so that the following CFA will not straddle a page boundary --

Code: Select all

the (headerless) CFA of `(?:)` 
the CFA to EXECUTE when TRUE
the CFA to EXECUTE when FALSE
maybe a junk byte if DP is $xxFF
This compilation strategy remains unchanged. What's different is the pre-increment vs. post-increment of IP in the inner interpreter, which must behave uniformly throughout.

I ran into situations where `(?:)` near the top of a page would increment the page (IP high byte) and then EXIT would do it again, ouch, crash! This is what I came up with. As always, when I do a thing that seems nonstandard, I am plagued with self-doubt. I will remain confused until I get the MSW (my stuff works) award and am about halfway through rewriting everything that touches IP , about a dozen words. Any thoughts on this would be most welcome.

In other news, having test automation and judicious use of AutoKey has made the VICE debugger very useful.
BruceRMcF
Posts: 388
Joined: 21 Aug 2019

Re: various PETTIL design considerations

Post by BruceRMcF »

Seems simpler if ?: checks if it's GOING to end up at $[xx][FF], and if so, compiles PAGE and then allots to place (?:) at $[xx+1][00]. Wastes 7 bytes in a roughly 1/256 chance.

But it is not standard whether to increment the IP before or after stacking. When NEXT is a short macro, it's often after, just because that makes EXIT into "POP RS; NEXT". But in a processor where the two are symmetric, doing the one that is more convenient for words manipulating the inner interpreter seems to make sense.
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

BruceRMcF wrote:
But it is not standard whether to increment the IP before or after stacking. When NEXT is a short macro, it's often after, just because that makes EXIT into "POP RS; NEXT". But in a processor where the two are symmetric, doing the one that is more convenient for words manipulating the inner interpreter seems to make sense.
my exit was "POP RS->IP; JMP NEXT"
now it is "POP RS->IP; JMP NEXTO"

Code: Select all

next    inc ip
        inc ip
nexto   jmp (ip)
Several other words also have inline parameters, and I'd like to resolve any problems without just tossing `PAGE` in there, optimally wasting only 1 byte to realign things.

In one situation (JSR ENTER, JSR DODOES) the CFA immediately follows the JSR call. This calls for the "insert junk byte before" cure. In the other situation, there's an inline parameter with its last byte at $xxFE, where "insert junk byte after" fixes things up. In the latter case, the junk byte is only required when the very next thing to follow at $xxFF is a (2-byte) execution token (aka "XT"). If the thing following the inline parameter ending a $xxFE is a primitive, or data, or the JSR at the CFA of another word, or anything else, then dropping a junk byte is unnecessary. I use the words `CFA,` (which compiles a JSR opcode with an address from the stack, three bytes), or `XT,` (does the code field address from the stack, allot two bytes). CFA, will prepend a junk byte at $xxFC, and XT, will append a junk byte at $xxFF, but it does so by prepending it before XT, encloses the execution token itself. That should do it from the compiler side.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: various PETTIL design considerations

Post by GARTHWILSON »

My unnest for the '816 (actually 65802) ITC Forth assembly-language source code is just:

Code: Select all

        HEADER "unnest", NOT_IMMEDIATE  ; ( -- )
unnest: PRIMITIVE                       ; This does the opposite of
        PLA                             ; nest, and the same as EXIT.
        STA     IP                      ; It is often called SEMIS
        GO_NEXT                         ; because it's compiled by ;
 ;-------------------
(and EXIT's CFA just points to unnest's code. The reason to have both is that SEE, the de-compiling word, stops when it finds unnest but not EXITs that might come before the end of the word.) PRIMITIVE is a macro that just puts the parameter field's address in the CFA. GO_NEXT is a macro which in most people's applications would just assemble JMP NEXT.

In my assembly source code for the kernel, the HEADER macro includes the lines:

Code: Select all

        IF    $ & 1        ; If next addr is odd,
              DFB   0      ; add a 0 byte before you
        ENDI               ; lay down the link field.
("DFB" in the C32 assembler is "DeFine Byte"; so the above just lays down a zero byte.) This way, the LFA, CFA, and PFA are even-aligned, regardless of the name's length and whether or not it started out aligned. The '816 of course doesn't have the JMP (xxFF) bug, but I did it this way because there was a benefit for de-compiling with SEE. (It has been many years since I worked on it, and I can't remember off the top of my head why it helped.) CREATE for the target does the same kind of thing, using ALIGN.

nest is:

Code: Select all

nest:   PEI     IP       ; PEI IP replaces LDA IP , PHA here.  nest is the
        LDA     W        ; runtime code of : (often called DOCOL ). It is not
        INA2             ; really a Forth word itself per se; but it is pointed
        STA     IP       ; to by the CFA of secondaries.
        GO_NEXT
 ;-------------------
(INA2 is just a macro that lays down INA, INA.)
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
BruceRMcF
Posts: 388
Joined: 21 Aug 2019

Re: various PETTIL design considerations

Post by BruceRMcF »

chitselb wrote:
my exit was "POP RS->IP; JMP NEXT"
now it is "POP RS->IP; JMP NEXTO"

Code: Select all

next    inc ip
        inc ip
nexto   jmp (ip)
Either are perfectly standard EXITs. There are processors that do one faster and processors that do the other faster, and on those processors, the model will follow suit. IIRC, the 68K has a very fast four byte NEXT based on an available JMP ++(...) operation.

Many 6502 models are symmetric, and the approach is often based on parts of a model being ported from another processor's implementation.
Quote:
Several other words also have inline parameters, and I'd like to resolve any problems without just tossing `PAGE` in there, optimally wasting only 1 byte to realign things.
If it only trips up when bumped up to $xx00, and not when bumped to $xx01, I'd revise what I said and say that I'd probably toss in a NOP. But that's not advice, it's just the way I'd probably tackle it.

If shifting to "PUSH ++IP->RS; ..." / "POP RS->IP" cleans it up, that has the appeal that in case there is some other application-specific word that may run into the same or similar issue, the issue has already been cleaned up.
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

BruceRMcF wrote:
chitselb wrote:
my exit was "POP RS->IP; JMP NEXT"
now it is "POP RS->IP; JMP NEXTO"

Code: Select all

next    inc ip
        inc ip
nexto   jmp (ip)
Either are perfectly standard EXITs. There are processors that do one faster and processors that do the other faster, and on those processors, the model will follow suit. IIRC, the 68K has a very fast four byte NEXT based on an available JMP ++(...) operation.
Thank you for the validation. I had only dissected a few other Forths and they used a post-increment EXIT. My inner interpreter (ENTER, EXIT, (+LOOP), (LOOP), etc...) is rewritten now and works well with the pre-incrementing!

For my next trick, I'm revisiting SAVE-BUFFERS and LOAD-BUFFERS (also VERIFY-BUFFERS which is a minor variant). PETTIL stashes blocks of virtual memory in (what I am calling) "packets" in a structure that builds downward in upper RAM. A packet is either "data" or "screen" and is either "compressed" or "uncompressed". Those two bit flags and a "size" form the 16-bit packet header. For compression, I use run-length encoding, which works well for typical Forth source. A screen packet will have 24 bits of "linewrap" stored adjacent and below the packet header. Each bit indicates whether that physical line is the start of a logical line, or not. Only 24 bits per screen are required, because physical line 0 is always the start of a logical line.

When I started writing this, the scope was intended to be a nifty Forth for my hardware PET. Project scope has expanded to the rest of the Commodore 8-bit line, which means adding disk support. The virtual memory buffer (all those packets) will load to a different place every time. Since all navigation is relative (only sizes, no absolute addresses) this doesn't require any changes to the data after loading it in. However, loading a file to a different load address than the one stored in the first two bytes is problematic. Here's the layout.

Code: Select all

immediately after LOAD:

0000-03FF |zeropage|stack|systemstuff| 
0400-46B7 |core|startup|studio|symtab| all mooshed together

Code: Select all

after running PETTIL initialization:

0000-008C |PETTIL zeropage|
008D-03FF |zeropage|stack|system|
0400-040C |10 sys1039
040D-040F |JMP(startup)|
0410-049C |the bottom half of zeropage that BASIC was using|
049F-066B |Sweet16|
066C-1A55 |PETTIL `core` dictionary, (just code)|
1A56-52FD |free, unused memory, filled with DE AD BE EF pattern|
52FE-52FF |an empty virtual memory buffer (VMBUF)| grows downward, ends at BLKBUF
5300-56FF |1K block buffer (BLKBUF)| 1K block, ends at SYMTAB
5700-66FF |symbol table (SYMTAB)| this grows upward.  FORGET sorts and moves it
6700-7FFF |PETTIL `studio` dictionary (just code)|

so, here's how we get to VMBUF

: blkbuf    symtab @ b/buf - ;
: vmbuf    blkbuf 2- ;
PETTIL studio is the programmer environment, with every definition that touches the symbol table. When the compiling is done, all memory above 1A55 may be reclaimed and used for other purposes! Of course, you lose the editor, assembler, interpreter, compiler, find, the symbols, etc... but that's okay because your application code is built now and no longer needs those things. Storing the top word (e.g. STARTREK) into STARTUP will bring it up at launch, and SAVE-FORTH can store such an image on tape or disk like a turnkey application. BLKBUF will now be 7C00-7FFF and VMBUF still goes adjacent and just below that.

In no particular order, these are the design questions that I ponder today.

A VIC-20 is more useful for graphics when more of the $1000-1FFF region is available to the VIC chip. It'd be nice to move things around so `core` does not live there. What memory layouts might work better?

On the 80-column PET and the 128, a screen has 2000 characters (and no linewrap). It's workable if I make BLKBUF a 2K region and have screen packets be this size. The ewww factor is high.

Moving VMBUF at load is a pain. PETTIL has to look at the load address, subtract it from the end address, and subtract that number (packet file size) from BLKBUF. Then get the data. To make things more annoying, whenever the machine does tape or disk I/O, and the user hits the STOP key, or an I/O error occurs, the ROM drops to the BASIC `READY.`prompt. That code is chiseled into concrete, so all disk I/O for load, save and verify needs to be sandwiched between a pair of `aloha` calls to ensure that zero page is okay for running BASIC. Typing RUN resumes PETTIL with a warm start if this happens.

This (relocating loader that looks at the file load addres before calculating the effective load address) seems like it could be the sort of problem that someone else has already solved. All ideas are welcome.
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

chitselb wrote:
This (relocating loader that looks at the file load addres before calculating the effective load address) seems like it could be the sort of problem that someone else has already solved. All ideas are welcome.
Mia Magnusson came up with this in Facebook group "FORTH PROGRAMMING / RETRO COMPUTING" (https://www.facebook.com/groups/273924826349346/)

"p.s. do you really need to use standard format instead of inserting the length at the start of the file?"

INSERTING THE LENGTH AT THE START OF THE FILE! Of course! And after loading into memory, replacing the value with 00 00 to mark the tail of VMBUF again.

SAVE-BUFFERS replaces double null with the (negated) buffer size, also the PRG file size. At LOAD-BUFFERS or VERIFY-BUFFERS the first two bytes of the file are subtracted from `blkbuf` on the running system. So that takes care of the disk load.

For tape, the code is a little different, calling ROM routine READHEAD which populates the 192-byte cassette buffer at 027A with filetype, startaddr, endaddr, "filename string" (and blanks to the end of the buffer). After calculating the effective load address, a call to READDATA brings it in.

VERIFY-BUFFERS has to first store the filesize over the '00 00' at VMBUF, then call READDATA, otherwise those two bytes will not match, guaranteeing 100% "verify error". VERIFY-BUFFERS should also fail early after READHEAD, if there was not already '00 00' at this address.

Following READDATA,
( loadstartaddr ) off \ mark the tail -- both VERIFY-BUFFERS and LOAD-BUFFERS
( loadstartaddr ) vmbuf ! \ uservar, the place where new packets are added -- LOAD-BUFFERS only
chitselb
Posts: 232
Joined: 21 Aug 2010
Location: Ontonagon MI
Contact:

Re: various PETTIL design considerations

Post by chitselb »

I finally got the outer interpreter behaving correctly and could not be more thrilled! It does nested loading of screens and has so far withstood all the torture tests I've thrown at it. Here's the source, feedback appreciated

Code: Select all

lazy-loading outer interpreter

code skip   ( -- offset )
    $f0 #       lda,
    $2c c,  \ bit abs opcode
    \ fall through
end-code

code scan   ( -- offset )
    $d0 #           lda,
    \ entry from skip
    'modifyskipscan sta,
    in              ldy,
                    dey,
    begin,
                    iny,
    span            cpy,
    cc while,
    n6 )y           lda,
    n7              eor,
here &modifyskipscan
    vc until,    \ changed to BEQ (skip) or BNE (scan)
    in              sty,
                    tya,
    push0a          jmp,
end-code

: parse   ( -- length )
     skip dup 1- <n6 +
     scan rot - in 1+! >n8 tuck
     ?: tuck under c! ;

create firmwarecursor
    $AF c,   $C6 c,  $D5 c,    $C4 ,  $D8 c,
\ C4 PNT   current (logical) screen line address
\ C6 PNTR  current cursor column on current line
\ D8 LNMX  width of the screen (39 or 79)
\   DFLTN    PNTR    TBLX      PNT    LNMX
\    3        in      lin    lin*40   span
\                            +blkbuf

code =cursor   ( addr|0 -- )
    z stx,    -6  # ldx, 
    0 # ldy,
    begin,     tos 1+ lda,
    if,  firmwarecursor 250 - ,x ldy,
    then,
    tos )y lda,     pha,
    cursor 250 - ,x lda,
    tos )y sta,     iny,    pla,
    cursor 250 - ,x sta,   inx, 
    0>= until,
    z ldx,  drop jmp,

: query
    tib 80 expect ;

: refill   ( -- )
    sib blk@ block cursor 3 c!+ 0 !+ 
    lin 40* rot + !+ >r 
    lin buf.wrap >bit cbit@
    ?: forty eighty  dup r> c! 
    0 =cursor  expect  0 =cursor ;

code lin+   ( -- flag )
    lin lda,  
    0>= if,
        cursor 5 + bit, 
        vc if,
            iny,
        then,
    then,
    iny,  lin sty,  next jmp,
end-code

code eoi?   ( -- flag )         \ end of input
    blk lda,
    ' eol? beq,
    lin lda,  l/scr # cmp,   xpushc jmp,
end-code

code eol?   ( -- flag )
    in lda,   span cmp,  xpushc jmp,
end-code

code empty?   ( -- flag )
    n ldy,   iny,    xpushz jmp,
end-code

: nomloading? ( -- flag )
    lin+ eoi? ;

: nomsession? ( -- flag )
    span c@ empty? or ;

: hungry?   ( -- flag )
    blk@ ?: nomloading? nomsession? ;

: ?nomnom   ( flag -- )
    ?exit blk@ ?: refill query ;

: name   ( char -- nfa|false )
    blk@ ?: sib tib  n6 2!
    begin   eol?
    while   hungry?   ?nomnom  
            eoi?
    until
    eoi? ?: false parse  n0 coff ;

: grok   ( cfa flag -- )
    compiling? <>
    ?: execute xt, ;

code \  ( -- )
    0 # lda,  span sta,   next jmp,

: interpret   ( -- )
    begin   ?stack  bl name ?dup
    while   found? ?dup
            ?: grok number
    repeat ;

: .ok
    ." ok" cr ;

: quit   ( -- )
    rp! blk 2off
    begin          
        [compile] \
        interpret
        compiling? ?: cr .ok
    again  ; -2 allot

: abort   ( -- )
    sp! forth definitions quit  ; -2 allot

code chkblk   ( u -- u flag )
    tos lda,  sec,   0<> if, #blk cmp, then,  xpushc jmp,

: load   ( u -- )
    chkblk  8 ?error  \ "BLOCK OUT OF RANGE"
    6 blk m>r
        blk ! in on  prev on  [compile] \
        interpret
    6 blk r>m 
    in c@ refill in c! ;

: -->   ( -- )
    scr @ blk@ ?dup
    if, nip 1+ then,
    blk ! in off [compile] \
    interpret ;
Post Reply