6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Sep 29, 2024 9:22 am

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Thu Dec 01, 2016 10:05 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
It has come to my attention that though I introduced Typist's Assembler Notation (TAN, https://docs.google.com/document/d/16Sv ... SojNTm4SQ/) months ago, there is still code present in the traditional format. This, to paraphrase Darth Vader, is disturbing, and will be followed up on once the Revolution comes. 8)

Until then, I have thrown together a small tool to help convert TAN to traditional notation - typ65conv (https://github.com/scotws/type65conv). Currently, in its early BETA state, it only converts instructions. Given, say
Code:
start
                  ldx.# 00
                  ldy.# %00001111
                  txa
                  adc.l 10:0000
loop              sta.x 1000
                  dex
                  bne loop
it can turn this into
Code:
start
                  LDX #$00
                  LDY #%00001111
                  TXA
                  ADC $100000
loop              STA $1000,X
                  DEX
                  BNE loop
Upper/lower case is optional, and labels can be given colons. I started to figure out a way to convert directives, but given the number of variants out there, I now think that would be a project best left to an AI.

There is some rather obvious functionality that could be added, like making sure the operand has the correct number of hex digits (so "lda.z 0000" - which TAN is clever enough to process as "lda.z 00" - is converted to Zero Page "LDA $00", not Absolute "LDA $0000"), and some directives can at least have their parameter numbers converted. I'll get to that at some point, but I need to do something else for a while now. Once Liara Forth has reached a certain size, I'll start including traditional format variants, and that should provide the motivation to add more stuff.

This is my first larger project with Go (https://golang.org/doc/), and I'm afraid it shows - the main routine is a mess, and needs to be rewritten from scratch. However, I did manage to include concurrency to the point where each line gets its own goroutine (a form of a thread) up to the number of virtual cores. Because of the overhead, this probably makes the program slower for small files, but I did learn how easy it is to get something like that working in Go.

(Also, the language has very nice built-in testing features, and the formatter is so brilliant - just type the code and let the machine figure out the best indentation - that I have written an equivalent tool for TAN itself (https://github.com/scotws/tinkasm/tree/master/tinkfmt). The amount of time this saves in Go, especially compared to the nit-picking required for Python indentation, is ridiculous.)

Feedback, corrections, and comments are more than welcome. I know you were just dying for a tool like this ...


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 02, 2016 9:28 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10940
Location: England
Nice to see go in the wild! I imagine it is still in the slow build-up of use. I see there are several 6502 emulators written in golang.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 14, 2018 11:12 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
TAN has been replaced by Simpler Assembler Notation (SAN) -- yes, even simpler, and maybe with added chocolate! Actually, because the disassembler for Tali Forth 2 uses this, I needed to write up the specs. They are now at https://github.com/scotws/SAN and little changed from TAN, actually; mostly, it's far less radical (no more hex by default, for instance).


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 5:54 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
If I may offer a critique:

Most modern languages involve sufficient use of symbols to make programming with a non-English keyboard frustrating. In that respect, 6502 and 65816 assembly are far from unique. Many Finnish programmers obtain an English keyboard for coding with, even though that makes typing text in their native language more difficult. (As an Anglophone, you can have my UK-spec mechanical keyboards over my cold dead body.)

TAN and SAN do one thing right, which is to unambiguously distinguish Direct (or Zero Page), Absolute and Long addresses from each other. That is an acknowledged limitation of the standard assembly, and various assemblers have syntax extensions to address it. However, that singular goal could be achieved much less invasively.

The 6502 has far from the only assembly to indicate addressing modes by decorating the address operand. It is also a feature of x86, ARM and 68K assembly, and to a lesser extent PowerPC. From the programmer's perspective, this is the logical place to put such decorations, even if the effect after assembly is only on the opcode, because the addressing mode is effectively part of the desired address. So in practice, TAN's change of syntax simplifies implementing the assembler (because the operand is easier to parse), but makes the code less readable.

In the context of the 65816, TAN and SAN do not unambiguously distinguish 8-bit and 16-bit immediate operands from each other. This difference does not change the opcode, but it does change the length of the assembled instruction. Since this is by far the most confusing and error-prone aspect of programming the '816, I would have hoped for a better solution here.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 6:57 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Chromatix wrote:
In the context of the 65816, TAN and SAN do not unambiguously distinguish 8-bit and 16-bit immediate operands from each other. This difference does not change the opcode, but it does change the length of the assembled instruction. Since this is by far the most confusing and error-prone aspect of programming the '816, I would have hoped for a better solution here.

As I understand it, the size of the immediate operands is an assembler setting, it has nothing to do with the instruction. In fact, the assembler should NOT intuit what the value should be.

The assembler setting would be referred to in order to determine if LDA #0 is a 8 or 16 bit load.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 7:14 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
That's how standard assemblers do it, yes. I think it's the wrong solution, because you have to keep telling the assembler when the state of those flags changed, otherwise it will produce an incorrect binary. (Yes, you can use macros to help this. But it's still a source of potentially very weird-looking bugs.)

Actually, the statefulness of the CPU itself in this respect is unhelpful, just as it was from the beginning with Decimal mode. It means you can't reliably obtain a disassembly without knowing what mode the CPU enters the code in, unless the code happens to set that mode itself. It increases interrupt handling overhead, because the first thing the ISR has to do is make sure the CPU is in a predictable mode - even if 99% of your application code already uses that same mode, you have to guard against the 1% that briefly switches to another one for some specialised, performance-critical routine.

Maybe there is no really good solution to this problem. But the mnemonic suffix would have been a logical place for TAN and SAN to try.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 16, 2018 8:14 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
Chromatix wrote:
Actually, the statefulness of the CPU itself in this respect is unhelpful, just as it was from the beginning with Decimal mode. It means you can't reliably obtain a disassembly without knowing what mode the CPU enters the code in, unless the code happens to set that mode itself. It increases interrupt handling overhead, because the first thing the ISR has to do is make sure the CPU is in a predictable mode - even if 99% of your application code already uses that same mode, you have to guard against the 1% that briefly switches to another one for some specialised, performance-critical routine.

Fortunately, the 65c02 and '816 automatically clear the decimal flag as part of the interrupt sequence. No CLD needed.

As for 8- versus 16-bit accumulator or index registers, the programmer always better know what size he has them. I like that the C32 assembler doesn't try to keep track from the last time you set them, but requires, if you want 8-bit, that you write something like AND #<$1F, the "<" meaning only the low 8 bits go into the operand. This way, I have never had an error in operand size.

In my '816 Forth assembler, I merged the addressing mode with the mnemonic, to keep it very simple and compact, avoid parsing, and keep it in mnemonic-operand order unlike many Forth assemblers. Then you have for example ADC# FOOBAR , (and the comma is Forth's word that takes the top data-stack cell and compiles it where the dictionary pointer points, and increment the pointer by one cell's size, which in the case of the 65xx is two bytes; if you want to compile only one byte, you use C, the "C" standing for "character" or 1 byte).

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 17, 2018 7:40 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Thank you for the feedback - I would be very grateful for suggestions how to deal with the the 8/16 Immediate problem in mnemonics. I think your idea is very good, because the assembler could compare what you think you are doing (8 or 16 bit) and what it thinks you should be doing.

So far, the best I have come up with is lda.# (8 bit) vs lda.## (16 bit) which I assume I'm not the first person to think of. The brute-force variant would be lda8 and lda16, which would mean giving up the "#". I wouldn't mind, but "#" is pretty ingrained by now. Given the problem, the switch might be worth it.

Another approach to the whole problem would be to say, screw the simple part, we're going to change a lot more. Then you could consider changing the "a" in 8-bit lda to "c" as ldc with 16-bit instructions. Alas, that doesn't help us with X or Y. You could also change trb to tcb - it's "clear" for pretty much every thing else, why "reset" here?

Now, if you want to go whole hog, we could try to standardize a bunch of pseudo-instructions that people are currently handling as assembler macros, like switching modes. I use a8, a16, xy16, xy8 which (I just realize) would fit well with lda8. Hmm.

Again, I'm open to a more radical departure from the traditional syntax if we can say it makes things easier and less error-prone. Suggestions most welcome!


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 17, 2018 8:40 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
So you don't like C32's way of using the < to specify 8-bit operands, otherwise it does 16-bit? After using it a lot, I think it works very well. I have for these common Forth primitives for example:
Code:
        HEADER "ON", NOT_IMMEDIATE      ; ( addr -- )   Store FFFF at addr.
ON:     PRIMITIVE
        LDA     #$FFFF                  ; Note that without the "<", you get a 16-bit operand.
        BRA     off1                    ; Finish up in OFF, below.
 ;-------------------
        HEADER "OFF", NOT_IMMEDIATE     ; ( addr -- )   Store 0 at addr.
OFF:    PRIMITIVE
        LDA     #0                      ; Note that without the "<", you get a 16-bit operand.   
 off1:  STA     (0,X)                   ; There's no STZ (dp,X).
        POP1
 ;-------------------
        HEADER "C_ON", NOT_IMMEDIATE    ; ( addr -- )   Store $FF at char addr.
C_ON:   PRIMITIVE
        ACCUM_8
        LDA     #<$FF                   ; The "<" after the "#" gives you an 8-bit operand.
        BRA     cof1                    ; Finish up in C_OFF below.
 ;-------------------
        HEADER "C_OFF", NOT_IMMEDIATE   ; ( addr -- )   Store 0 at char addr.
C_OFF:  PRIMITIVE
        ACCUM_8
        LDA     #<0                     ; The "<" after the "#" gives you an 8-bit operand. 
 cof1:  STA     (0,X)                   ; There's no STZ (dp,X) or STY (dp,X).
        ACCUM_16
        POP1
 ;-------------------

ACCUM_8, ACCUM_16, INDEX_8, and INDEX_16 just lay down the appropriate REP or SEP instruction. They don't tell the assembler how big to make immediate operands. There are very few places in my '816 Forth kernel where the register size is changed though. I mostly leave it at 16-bit A and 8-bit X & Y.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 17, 2018 9:06 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
One option might be to divorce the destination operand from the mnemonic, essentially moving in the opposite direction. This is already partially accepted by many assemblers in the form of "ROR A" and friends, and could result in shorter, more RISC-like mnemonics.

The most radical idea I have is to make brackets unambiguously refer to operands in memory, and to name registers according to their length as well as identity. All Txx, LDx and STx mnemonics would then be replaced by a simple MV. Assemblers would then not be required to maintain mode state, but could optionally warn if they observe inconsistent modes being used without an intervening instruction that's capable of changing the mode. These instructions would include SEP/REP, any unconditional jump or branch including BRK and COP, and PLP.

The mandatory use of brackets gives an opportunity for unambiguously distinguishing zero/direct page, absolute 16-bit and long 24-bit addressing. Standard 65816 syntax uses square brackets [] for indirect 24-bit addressing and round ones () for indirect 16-bit. We can extend this convention with curled brackets {} for 8-bit addressing.
Code:
MV A, X          ; was TXA - destination operand now comes first
MV A, 0          ; was LDA #0 with 8-bit accumulator mode
MV C, 0          ; was LDA #0 with 16-bit accumulator mode
MV A, {$00}      ; was LDA $00 with 8-bit accumulator mode, zero/direct page
MV {$00}, A      ; was STA $00 with 8-bit accumulator mode, zero/direct page
MV A, ($00)      ; was LDA $0000 with 8-bit accumulator mode, absolute
MV A, ($0000+X)  ; was LDA $0000,X with 8-bit accumulator and index modes, indexed
MV C, [{$00}+YY] ; was LDA [$00],Y with 16-bit accumulator and index modes, long post-indexed indirect

This syntax doesn't explicitly distinguish the lengths of the RMW modes of the shift, rotate and increment instructions. Technically the assembler doesn't need to know them, since the opcode doesn't change and neither does the length of the instruction.


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 17, 2018 2:45 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
I'm trying to get away from special characters, which really slow you down typing -- which is why I'm warming to the idea of lda.16 and lda.8 etc for the immediate instructions. We'd get rid of "#". Should have thought of that before.

To revisit the question of where to put the "decorations" (as Chromatix calls them) - one other variant would be to add the indices (and only them) to the operand. Say
Code:
lda.d 10,x ; was lda $00,x
lda.di 10,x ; was lda (10,x)
lda.di 10,y; was lda (10),y
That would preserve the relationship between operand and index. The last two examples show why I decided against that - the current SAN version with lda.dxi and lda.diy shows you where the brackets are in the conventional system. I'll be the first to admit, though, that lda.dily 10 (for lda [10],y) is downright silly. That's a dilly of a suffix ...

As for the mv system (I'd use "mov" to keep the stems at three chars)- how do we deal with 8-bit A to 16-bit X? The "C" is obvious for 16-bit A, but there is nothing of the sort for X and Y. This is the problem with tax for instance: It can be one of four different combinations of 8/16-bit A and 8/16-bit X. What you could do is (always source -> destination, because that switching gives me a headache, and dropping the comas):
Code:
mov a8 x16 ; was tax
mov a8 (00) ; was sta (00)
mov x16 a8 ; was txa
and let the assembler figure out when to insert a mode switch for the registers. That's really far away from the original syntax, though.

I think Chromatix is right that in the end, the statefulness of the 65816 is the problem. We have emulated and native modes (and remember, most of the new stuff works in emulated mode), decimal mode in both, and then the state of the registers. With all the combinations, that gives us:
Code:
emu A8 XY8 bin
emu A8 XY16 bin
emu A16 XY8 bin
emu A16 XY16 bin

nat A8 XY8 bin
nat A8 XY16 bin
nat A16 XY8 bin
nat A16 XY16 bin

emu A8 XY8 dec
emu A8 XY16 dec
emu A16 XY8 dec
emu A16 XY16 dec

nat A8 XY8 dec
nat A8 XY16 dec
nat A16 XY8 dec
nat A16 XY16 dec
16 different combinations. Even if we say, bah, nobody uses decimal anyway, we've got eight. Add a humbug, you just switch to native and stay there, you still have four. Which is still a lot, but only has to deal with the registers.

(Having written that, I wonder if it would make sense to retool my back-burner 65816 emulator to just work with binary native modes. It would simplify things dramatically and deal with 90 percent of the cases ...)

One other way to deal with the whole problem is to abstract things away with a bytecode interpreter/compiler like PLASMA does. You could jump to a higher-level virtual machine, either stack- or register based, and deal with byte (8 bit), int (16 bit), long (32 bit), strings, etc, with a constant pool and all the other things that go into a bytecode system. You could hide the memory segmentation completely.

If we could agree on a model and syntax, that might be the long-term way forward for the MPU. However, it does take us far away from programming assembler as assembler.


Top
 Profile  
Reply with quote  
PostPosted: Sat Nov 17, 2018 3:16 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Quote:
As for the mv system (I'd use "mov" to keep the stems at three chars)- how do we deal with 8-bit A to 16-bit X? The "C" is obvious for 16-bit A, but there is nothing of the sort for X and Y.

I'm proposing "XX" and "YY" for the extended index registers, because they're easy to remember and type. So you would say "MV XX, A" for your example, and it would assemble as TAX just as "MV X, A" or "MV XX, C" or "MV X, C" would. However, the CPU would actually perform a 16-bit transfer, according to WDC's datasheet, with the top half of the write being masked off if the destination register is in 8-bit mode.

Incidentally, although I gave specific examples only for MV, it may be feasible to make two-letter mnemonics for the other instructions as well, or at least the ones that take operands.

The problem with bytecode interpreters is that they're inherently slower than native assembly - so for performance-critical applications, you still need to write assembly (and with these small chips, many more applications are sufficiently performance-critical for this than would be on a faster, more capable CPU). They have their place, of course. I'd say that writing one, and the toolchain to go with it, is a separate subject entirely.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 19, 2018 8:45 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
The XX and YY variants make good sense. In the end, I think yes, that would be a viable way to do the mnemonics, though it's not what I'm looking for because of how radical the change is and the special characters. By all means, how about a separate thread with a complete specification?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: