Kowalski Simulator Updates

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Mar 10, 2022 5:34 am

BigDumbDinosaur wrote:

Something worth noting is the Kowalski assembler has separate dictionaries for macro names and labels/symbols...

Further to the above, macro names are always case-sensitive, even thought the caseinsensitive assembly option has been declared—caseinsensitive is only applicable to labels, mnemonics and symbols. For example, break and BREAK are two different macros, as far as the assembler is concerned.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Apr 05, 2022 5:07 am

Okay, this is a bit weird.

We’ve all seen assembler diagnostics rebuking us for defining a global label or symbol more than once. Here is a case where derived labels have been redefined, yet the Kowalski assembler didn’t complain about it.

The attached file is a listing from a program on which I’m working. First look at lines 2151 to 2179 inclusive. All looks normal. Now, look at lines 20190 to 20220 inclusive. Those lines are exact repeats of lines 2151-2179. The assembler should have complained about the redefined labels, yet it did not. As can be seen, the definitions at 20190-20220 have the same numeric values as those at 2151-2179. The program assembles without incident, including any instructions that are referencing the subject labels.

But wait! There’s more! If I comment out lines 2151-2179, things break. Below is some code that refers to labels in that block of equates:

Code: Select all

;	prepare superblock...
;
(1)	blkcpy ifsnam,sb_fsnam,s_fsnam     ;set filesystem name
(2)	blkcpy fsmagic,sb_magic,s_fsmag    ;set magic number
(3)	kstmget sb_ktime,'f'               ;set creation time
(4)	kstmget sb_mtime,'f'               ;set last modification time

In the above, the numbers in parentheses are statement numbers for discussion purposes.

The referenced field sb_fsnam in statement (1) is in the block of definitions that were commented out at 2151-2179. Statement (1) will assemble without error. However, the assembler will complain at statement (2) that sb_magic isn’t defined, even though it is indeed defined in the block of definitions at lines 20190-20220 in the attached listing file.

If I then comment out statement (2), the assembler will complain at statement (3) that sb_ktime isn’t defined, despite being in the block at definitions at 20190-20220. Similarly, if I comment out statement (3), the assembler will complain at statement (4) that sb_mtime isn’t defined, it too also defined at 20190-20220.

Making this even more puzzling, those labels are visible in the symbol table when viewed using [Alt-6]. It’s as though the assembler is failing to to find them in the symbol table when they are referenced in program statements.

I don’t think this is a forward-reference sort of error because the label sup_targ, as well as the labels o_... that are used to define sb_magic, sb_ktime, etc., are defined earlier in the source code. My only guess is there is some sort of problem with the part of the assembler that scans the symbol table to determine if a label/symbol has been defined and if so, it’s value.

mkfs_list.txt: (903.94 KiB) Downloaded 121 times

barrym95838 · Post by **barrym95838** » Tue Apr 05, 2022 2:18 pm

20K+ lines of source could very likely be the largest file that simulator has had to digest in its lifetime. You may be exploring limitations on internal data structures that simply weren't anticipated. A good way to test that theory would be to trim a few hundred lines of comments, followed by a few dozen seemingly unrelated labels and then reassembling to see if the erroneous behavior is identical.

If Daryl figures this one out, he clearly deserves "Employee of the Month".

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Apr 05, 2022 9:22 pm

barrym95838 wrote:

20K+ lines of source could very likely be the largest file that simulator has had to digest in its lifetime. You may be exploring limitations on internal data structures that simply weren't anticipated.

I briefly flirted with that thought but then considered that the assembler likely reads the source a line at a time and like most assemblers, immediately skips to the next line when a comment is encountered. Multiple lines are usually not buffered by most assemblers. The data structures that do grow are the macro dictionary and the symbol table.

Quote:

A good way to test that theory would be to trim a few hundred lines of comments, followed by a few dozen seemingly unrelated labels and then reassembling to see if the erroneous behavior is identical.

I don't think it’s the line count that is causing the trouble. I had, in the past, written a huge program with 128K+ lines to see how the assembler would behave. The program consists of a commented NOP statement followed by a comment line, i.e., two lines per instruction. The pair was repeated 65,535 times (see attached source file). The assembler processed it without complaint (see attached listing).

Quote:

If Daryl figures this one out, he clearly deserves "Employee of the Month".

I once promoted him to “God” status by accident.

big_source.asm: Huge Program Source Code; (4.44 MiB) Downloaded 131 times

big_source.txt: Huge Program Listing; (6.72 MiB) Downloaded 137 times

8BIT · Post by **8BIT** » Wed Apr 06, 2022 2:00 am

Just to let you know I have seen this. In the last 8 months, I've bought a 20 acre home in the mountains, moved, and sold our old home. Also retired from one job and started another - just finished up my probation there. Needless to say, my free time does not exist for the next few months.

When I do get time, I'll take a look at this behavior. Feel free to ping me in June if you have not heard back from me.

thanks!
Daryl

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Apr 06, 2022 2:13 am

8BIT wrote:

Just to let you know I have seen this. In the last 8 months, I've bought a 20 acre home in the mountains, moved, and sold our old home...When I do get time, I'll take a look at this behavior. Feel free to ping me in June if you have not heard back from me.

Well, you’ve been busy! How high up in the mountains are you?

When you are able to get to this, I can send you the actual code on which to test.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Apr 08, 2022 12:27 am

Another thing that is wonky is the .PARAMTYPE (parameter type) macro operator.

.PARAMTYPE is supposed to differentiate between a macro argument passed as numeric and an argument passed as a character string. .PARAMTYPE is documented, but causes an error if used. The below short program illustrates this.

Code: Select all

         .opt caseinsensitive,swapbin

string   .macro ...
         .if @0
.p           .=1
             .rept @0
                 .if .paramtype(@.p) == 2  ;if parameter is a string
                     .byte @.p$
                 .endif
.p               .=.p+1
             .endr
             .byte 0
         .endif
         .endm

         *=$400

         string "this is",$12," a test"

If the above were to correctly assemble, the $12 argument would be skipped during macro expansion due it not being a string. Instead, assembly will stop with the text pointer at the STRING macro invocation, with the actual error occurring on the line with the .PARAMTYPE test.

8BIT · Post by **8BIT** » Sat Jun 04, 2022 8:09 pm

BigDumbDinosaur wrote:

Okay, this is a bit weird.

We’ve all seen assembler diagnostics rebuking us for defining a global label or symbol more than once. Here is a case where derived labels have been redefined, yet the Kowalski assembler didn’t complain about it.

The attached file is a listing from a program on which I’m working. First look at lines 2151 to 2179 inclusive. All looks normal. Now, look at lines 20190 to 20220 inclusive. Those lines are exact repeats of lines 2151-2179. The assembler should have complained about the redefined labels, yet it did not. As can be seen, the definitions at 20190-20220 have the same numeric values as those at 2151-2179. The program assembles without incident, including any instructions that are referencing the subject labels.

But wait! There’s more! If I comment out lines 2151-2179, things break. Below is some code that refers to labels in that block of equates:

Code: Select all

;	prepare superblock...
;
(1)	blkcpy ifsnam,sb_fsnam,s_fsnam     ;set filesystem name
(2)	blkcpy fsmagic,sb_magic,s_fsmag    ;set magic number
(3)	kstmget sb_ktime,'f'               ;set creation time
(4)	kstmget sb_mtime,'f'               ;set last modification time

In the above, the numbers in parentheses are statement numbers for discussion purposes.

The referenced field sb_fsnam in statement (1) is in the block of definitions that were commented out at 2151-2179. Statement (1) will assemble without error. However, the assembler will complain at statement (2) that sb_magic isn’t defined, even though it is indeed defined in the block of definitions at lines 20190-20220 in the attached listing file.

If I then comment out statement (2), the assembler will complain at statement (3) that sb_ktime isn’t defined, despite being in the block at definitions at 20190-20220. Similarly, if I comment out statement (3), the assembler will complain at statement (4) that sb_mtime isn’t defined, it too also defined at 20190-20220.

Making this even more puzzling, those labels are visible in the symbol table when viewed using [Alt-6]. It’s as though the assembler is failing to to find them in the symbol table when they are referenced in program statements.

I don’t think this is a forward-reference sort of error because the label sup_targ, as well as the labels o_... that are used to define sb_magic, sb_ktime, etc., are defined earlier in the source code. My only guess is there is some sort of problem with the part of the assembler that scans the symbol table to determine if a label/symbol has been defined and if so, it’s value.

mkfs_list.txt

I found some time today to start looking into this. My first hunch concerns data being across 2 banks. Can you send me a zip file of the source to this code? I want to see what's going on under the hood during assembly.

thanks!
Daryl

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Jun 04, 2022 8:26 pm

8BIT wrote:

I found some time today to start looking into this. My first hunch concerns data being across 2 banks.

Your hunch is correct. The program is being assembled in bank $00, with some dynamic data in the same bank, but other dynamic data in bank $01. The program counter is being used to define the address space, which of course, won’t fly if the space crosses a bank boundary. If I declare the bank $01 base address early in the source code and redefine the program counter with a *=<address> declaration, things will work.

Quote:

Can you send me a zip file of the source to this code? I want to see what's going on under the hood during assembly.

Unfortunately, not easily. There are a bunch of INCLUDEs involved in multiple subdirectories. Plus extensive use is made of macros, which are cataloged in various files according to their general function. I’d have to include all that in the ZIP.

8BIT · Post by **8BIT** » Sat Jun 04, 2022 10:34 pm

OK, I'll try to duplicate the issue using a shorter, more direct example. May take longer to resolve, but I'm glad to hear you have a way to work around it.

thanks!
Daryl

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Sep 14, 2022 7:49 am

Ran into a little snag with the assembler in writing 65C816 code.

It doesn’t accept absolute indirect long addressing, e.g., JMP [SOMEWHERE], as valid syntax and instead reports “addressing mode not allowed.”

8BIT · Post by **8BIT** » Fri Sep 16, 2022 3:36 pm

I have tried to make the Simulator behave according to the WDC datasheets.

Here's is what I found in the 65816 datasheet, dated Nov 9, 2018:

Pg 29 Table 5-1

Code: Select all

31. JML    Jump Long 
32. JMP    Jump to New Location

Pg 31, Table 5-4

Code: Select all

opcode   mnemonic address mode    cycles       bytes
$4C        JMP       abs           3             3
$5C        JMP       abs long      4             3
$6C        JMP       (abs)         5             3  
$7C        JMP       (abs,X)       6             3
$DC        JML       (abs)         6             3

Pg 32, Table 5-4

Code: Select all

JML   (a)    = $DC
JMP   a      = $4C
JMP   al     = $5C
JMP   (a)    = $6C
JMP   (a,x)  = $7C

Pg 35, Table 5-6

Code: Select all

The only addressing mode that uses brackets "[]" is [d] Direct Indirect Long.  There is no Direct Indirect address mode for JMP.

The assembler will assemble all of the above identified combinations, including JML ($3456) as $DC $56 $34, which will jump to the 24 bit address stored at address $3456 in the current bank.

Hope this helps clear up the syntax choices.

thanks!
Daryl

Proxy · Post by **Proxy** » Fri Sep 16, 2022 4:03 pm

JML [$1234] or JMP [$1234] is more consistent with the other long indirect addressing modes (like LDA [$00]), so i think it makes sense to allow it as an alternative syntax.
there are other instructions that also have alternatives that make sense (atleast IMO), like XOR instead of EOR, or PHW (1 mnemonic, 3 addressing modes) instead of PEA/PER/PEI (3 mnemonics, 1 addressing mode each), or BRA instead of BRL (where the assembler just selects the correct branch depending on the distance to the destination)

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Sep 16, 2022 8:21 pm

8BIT wrote:

I have tried to make the Simulator behave according to the WDC datasheets.

Here's is what I found in the 65816 datasheet, dated Nov 9, 2018...

The data sheet is inconsistent with Lichty & Eyes (see pages 61, 143, 384 and 459), the latter which indicates JMP [<addr>] is the syntax for an indirect long jump, with JML (<addr>) parenthetically mentioned on page 459 as an alternative. I have never seen JML (<addr>) in any 816 code. In fact, the data sheet seems to imply JML is an alias to JMP.

Given WDC's history of publishing data sheets with errors and inconsistencies, I’d be inclined to defer to Lichty & Eyes as the primary reference (although that, too, has some inconsistencies—vidi PEA and PEI).

Incidentally, a careful (re)read of the 816 data sheet failed to turn up any specific mention of the indirect long addressing mode when used with the jump instruction in any of the narratives. The operations tables are vague in that regard.

Proxy wrote:

JML [$1234] or JMP [$1234] is more consistent with the other long indirect addressing modes (like LDA [$00]), so i think it makes sense to allow it as an alternative syntax.

That's my opinion as well, which is also based upon Lichty & Eyes.

Quote:

there are other instructions that also have alternatives that make sense (atleast IMO), like XOR instead of EOR, or PHW (1 mnemonic, 3 addressing modes) instead of PEA/PER/PEI (3 mnemonics, 1 addressing mode each), or BRA instead of BRL (where the assembler just selects the correct branch depending on the distance to the destination)

EOR was inherited from the MC6800 assembly language, as was just about all of the other 6502 mnemonics. I doubt many 65xx programmers would think of XOR as a synonym. That said, XOR could be implemented in a macro, although in a somewhat cumbersome manner.

I could see where use of BRA would intelligently process the target address and figure out if a short or long branch is appropriate—if the target MPU is the 816. However, that behavior would be inconsistent with BRA as implemented on the 65C02.

PHW seems like a worthwhile synonym for PEA and PEI, although its usage in place of PEA creates a problem if the operand is in the range $00-$FF. A logical syntax to avoid misinterpretation would be PHW #<operand> to do what PEA does, or PHW <operand> to do what PEI does—<operand> would have to resolve to eight bits. PHW as a synonym for PER isn’t logical, as PER’s behavior is very different than that of PEA and PEI. Whereas the latter two blindly push a word that is stored in memory, PER pushes a word that is the result of a computation from a relative offset that itself is computed during assembly. How would you make that distinction using a single mnemonic?

All that said, the core goal here is to get the Kowalski assembler to be a top-notch 65C816 programming tool. Adding synonyms for “official” assembly language mnemonics, while laudable, may not be practical, due to the way in which the assembler translates source lines into machine instructions. Also, adding gingerbread and fluff increases Daryl’s workload, which is unnecessary programmer abuse.

8BIT · Post by **8BIT** » Fri Sep 16, 2022 11:45 pm

I'm open to adding more alternatives to the assembler, I just don't have much free time these days to do it. I'll keep the JMP [a] in mind for that rainy day.

thanks!
Daryl

Kowalski Simulator Updates

Re: Kowalski Simulator Updates: Macro Example

Kowalski Simulator Updates: Strange Error

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates: Strange Error

Re: Kowalski Simulator Updates: Strange Error

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates

Re: Kowalski Simulator Updates