6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Nov 12, 2024 12:10 am

All times are UTC




Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Tue Nov 20, 2018 7:00 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Following on from the other alternative assembly-syntax topic, whose design has completely different goals…

In my view, the single biggest handicap of the 65816 is its proliferation of mode flags, some of which actually change the length of instructions (using the same mnemonic and opcode) and are therefore relevant to the assembler. In the standard syntax promoted by WDC, the same syntax is used for both "long" 16-bit and "short" 8-bit immediate operands, and so the only way the assembler knows which is which is through internal state managed by assembly directives. Keeping track of this state can be very error-prone for the programmer, and the symptoms of the resulting bugs can be very odd indeed.

Another artefact of the traditional syntax is the proliferation of individual mnemonics for simply moving data around - an astonishing number given the small number of registers present. More modern assembly syntaxes use few mnemonics (sometimes just one) and specify the source and destination explicitly. Most of the non-transfer operations are more economical in their use of mnemonics.

A third fault of the traditional syntax is that the programmer cannot always explicitly specify which size of address to use, without relying on non-standard assembler extensions. With the older 6502 family, it was rare for the programmer to need an absolute 16-bit address when referring to zero-page, but the 65816 now has Direct Page which can refer to any 256-byte area in the first 64K, and the Data Bank Register can shift absolute-mode addresses by multiples of 64K; hence $0000 may dynamically refer to a different address than $00 in two different ways, either or both of which may differ from $000000. The standard 65816 extensions to the traditional 6502 syntax do not adequately address this.

The new syntax proposed below departs significantly from tradition to correct the above, aiming to support programmers of the 65816 instead of confusing them and getting in the way. A subset may of course be used on the older members of the 6502 family.

The first distinctive feature is that all directly-accessible registers have names which, if they have mode flags influencing size, differ to distinguish those sizes. This makes the assembler stateless, though disassemblers still inherently need to track (or guess) the CPU's mode to correctly determine instruction sizes. Registers which have only one size, or whose size doesn't affect instruction or transfer size, have only one name:
  • Accumulator: A (8-bit), AA (16-bit).
  • Indexes: X,Y (8-bit), XX,YY (16-bit).
  • Stack Ptr: S (8-bit in emulation mode, 16-bit in native mode).
  • Status Register: P (8-bit).
  • Direct Page Ptr: D (16-bit).
  • Data Bank Register: B (8-bit high).
  • Program Bank Register: K (8-bit high).
  • Program Counter: * or PC (16-bit)

The other distinctive feature is that immediate and memory operands have a completely new syntax. The # prefix formerly identifying immediate operands is dropped; instead addressed memory accesses are explicitly called out by three types of bracket, corresponding to direct-page {}, absolute (), or long [] addresses respectively. Indirect addressing modes contain two nested sets of brackets, indexed modes use + instead of a comma to link the appropriate register in, and stack pushes and pops involve the C post-decrement and pre-increment operators on the S register. All of the 65816's addressing modes can thus be expressed compactly, readably and unambiguously, whether the explicit operand is a literal, a label or a constant expression:
  • Immediate: expr
  • Direct Page: {expr}
  • Direct Page Index: {expr+X} or {expr+XX} or…
  • Absolute: (expr)
  • Index: (expr+X)
  • Absolute Long: [expr]
  • Index Long: [expr+X]
  • Direct Page Indirect: ({expr})
  • Direct Page Index Indirect: ({expr+X})
  • Direct Page Indirect Index: ({expr}+Y)
  • Direct Page Indirect Long: [{expr}]
  • Direct Page Indirect Index Long: [{expr}+Y]
  • Stack Pull: {++S}
  • Stack Push: {S--}
  • Stack Relative: {S+expr} -- the braces here indicate that the bank address is not taken from B, and the explicit offset is 8-bit.
  • Stack Indirect Post-Index: ({S+expr}+Y)

Jump and branch instructions do not directly follow the above pattern. If they did, then an absolute jump target would require "(label)", and an indirect jump "((label))". But these instructions don't involve any explicit access of the data at the target address, only loading or adding that address to the PC, which is more similar to using an immediate operand. For that reason, the traditional syntax is retained for jump and branch operands:
  • Branch Relative: *+expr
  • Branch to Label: label -- assembler will compute correct relative offset, and promote to long branch if required
  • Jump Absolute or Long: expr or label
  • Jump Indirect: (expr)
  • Jump Index Indirect (expr+X)
  • Jump Indirect Long: [expr]

Differences of opinion exist as to whether uniform three-letter mnemonics are superior to variable-length ones. The latter may require less typing on average, while the former help to align the operands for readability. If the programmer sets up tab stops appropriately (eg. every 4 columns) and uses a tab as the whitespace immediately following the mnemonic, the operands are just as easy to align as with uniform-length mnemonics, and the programmer receives an extra visual cue to help follow the structure of his code. A preliminary list of suggested mnemonics, based on this principle:
Code:
M    move, replaces LDA, LDX, LDY, STA, STX, STY, TAX, TAY, TYA, TXA, TXY, TYX, PHA, PHX, PHY, PHP, PHB, PHD, PHK, PLA, PLX, PLY, PLP, PLB, PLD, TXS, TSX, TSC, TCS, TDC, TCD, PEA, PER, PEI
A    add, replaces ADC
S    subtract, replaces SBC
I    increment, replaces INC, INX, INY
D    decrement, replaces DEC, DEX, DEY
C    compare, replaces CMP, CPX, CPY
T    test bits, replaces BIT
TS   test and set, replaces TSB
TC   test and clear, replaces TRB
LO   logical OR, replaces ORA
LA   logical AND, replaces AND
LX   logical XOR, replaces EOR
RL   rotate left, replaces ROL
RR   rotate right, replaces ROR
SL   shift left, replaces ASL
SR   shift right, replaces LSR
SH   swap halves, replaces XBA
CL   clear, replaces STZ, CLC, CLD, CLV, CLI, REP
ST   set, replaces SEC, SED, SEI, SEP
BPL, BMI, BVC, BVS, BNE, BEQ, BCC, BCS, RTS, RTL, RTI, BRK, COP, NOP, XCE, all perform their traditional duties.
B    branch, replaces BRA and BRL
BL   branch long, forces BRL
J    jump, replaces JMP and JML
JL   jump long, forces JML
JS   jump subroutine, replaces JSR (does *not* promote to JSL, because it's not transparent to the callee)
JSL  jump subroutine long, replaces JSL
MBD  move block decrementing, replaces MVP
MBI  move block incrementing, replaces MVN
WAIT, STOP replace WAI, STP
WDM  William D. Mensch, emitted as a 1-byte instruction to permit skipping a following 1-byte instruction.
NOPL long NOP, is actually WDM as an explicit 2-byte, 2-cycle NOP; the second byte is emitted as a standard 1-byte NOP.
For CL and ST, the flags of the P register are addressed by their familiar initials NVMXDIZC, and may be concatenated like that to set or reset multiple flags at once (emitting a SEP or REP). The assembler will emit one-byte instructions if a single C, D, V or I is given, but can be forced to emit SEP or REP if the letter is doubled. STZ, being a store instruction, is signalled by the operand of CL beginning with a bracket - so there is no namespace conflict.

Many of these new mnemonics (especially M!) require one or two registers to be specified explicitly for disambiguation, even though they replace formerly "implicit mode" mnemonics. Where two operands are needed, the destination register is listed first, as is common practice for RISC assembly. For example, "LDA $00" becomes "M A, {$00}" assuming an 8-bit accumulator, or "M AA, {$00}" if 16-bit. Instructions which operate only on the accumulator, however, do not need to declare this fact (eg. "A {$00}" is a sufficient replacement for "ADC $00").

While not mandatory, the assembler may aid programmers by watching for inconsistent use of register names of different sizes without changing mode meanwhile, and emitting warning messages if such mistakes are found. Technically, the register name is only significant to the assembler when immediate operands are used, but the programmer effectively asserts and reminds himself that a particular register mode is assumed through this mechanism. Sophisticated assemblers may trace through direct branches, jumps and subroutine calls to enhance the accuracy of this check. Indirect jumps, including BRK and COP, should be assumed to terminate the consistency check, since they can be re-pointed at any code whatsoever.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 1:39 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Some interesting thoughts here, Chromatix. Several of the ideas I quite like; for example treating all loads, stores & transfers as MOV's. Also I agree it's not necessary that mnemonics always be three letters long as is traditionally the case for 65xx. Four or even five letters may be appropriate. OTOH single-letter mnemonics seem too short; that's just my personal preference. And I expect others will post regarding their preferences. It may be difficult to build consensus and recruit support -- that's just a fact of life, under the circumstances. :|

Two (hopefully) constructive question / suggestions:

For 8-bit immediate you could use #expr, and for 16-bit immediate you could use ##expr. This matches your convention for A X Y vs AA XX YY, and probably eliminates the need for curly brackets around Direct-Page operands. (To me, having three types of brackets seems inadvisable, especially when square and round brackets denote indirection but curly brackets do not.)

Quote:
The first distinctive feature is that all directly-accessible registers have names which, if they have mode flags influencing size, differ to distinguish those sizes. [...]

  • Accumulator: A (8-bit), AA (16-bit).
  • Indexes: X,Y (8-bit), XX,YY (16-bit).
Is there a way this can be extended to include values in memory? For example, if we increment the accumulator there's no problem because we're obliged to specify A or AA. But I didn't notice any way to make the size apparent when we increment a value in memory. Worth thinking about?

cheers
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 4:57 am 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
I'm not a fan of the single letter mnemonics. We're not writing TECO here, and having more letters makes the language much more accessible, particularly to folks that don't know it.

My concern with the 8 vs 16 bit instructions is simply that (not having written much '816) I think it's very rare that you want to separate the mode that the assembler is in from the mode that the computer is in, and so, especially for a novice, simply doing "M AA, 0", or however you plan on doing it, doesn't instantly shift the computer in to 16 bit operation mode.

In the end, you'll find code setting the computer into 16 bit mode, followed by a bunch of AA, XX, and YY mnemonics, then back to byte mode, with a bunch of A, X, and Y mnemonics. Why am I jump through all those hoops when the assembler already knows what to do?

But there's nothing forcing the modes. You can be staring at "M AA, 0" all day long and wonder why it's not working properly without realizing that the CPU is in the wrong mode.

There's a reason folks combine the proper SEP/REP instructions with the appropriate assembler pseudo-op to set the modes properly.

A better case, I think, to not let the assembler do things silently. For example, there was discussion of being forced to do "LDA #<0" to use 8 bits vs "LDA #0" which is 16 bits. When the assembler sees you doing "LDA #<0", it can check that it is, indeed, in 8 bit mode, and flag an error or warning about a mismatch. This will at least help ensure that the programmer and assembler are both on the same page as to what the code should be.

Also, for those comfortable with the cast, there can be an assembly directive to remove the type check. Especially for legacy code, so it will "just work".

Finally, the new mnemonics eliminates any real chance of using legacy code. It all has to be rewritten. You can have a dual mode assembler, you can have folks link legacy code instead of using the same assembler, but it certainly removes any chance of cut and paste reuse.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 9:10 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Personally I can't see the point of having ##, AA, XX or YY if you can't guarantee that the processor will be operating in the correct mode when the code is executed.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 10:05 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
I think the idea of the assembler tracking the mode and issuing a warning or an error might be a helpful one. There's the possibility that the mode isn't known, I suppose, since knowing the mode kind of means tracing the flow graph.

Setting the mode for specific operations and then putting it back is quite a drag: unless for some tight loop like a string operation, I would tend not to do it.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 10:16 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
I've found that I assert the expected configuration at the start of every subroutine or entry point and then use macros to change the state within the code, for example
Code:
      .longa   on
      .longi   on
SpiInit:
      php
      short_a
      lda   #SPI_SCK   ; Set SCK as lo output
      tsb   PDD4
      trb   PD4
      lda   #SPI_MOSI   ; Set MOSI as an output
      tsb   PDD4
      lda   #SPI_MISO   ; Set MISO as an input
      trb   PDD4
      plp
      rts

A few years back I ported BDD's monitor code from his macro enhanced Kowalski source format to real WDC style code. I had to manually add the .longa/i directives to get the generated code to match.

Once you go native on the 65C816 it really is a very different beast.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 10:19 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Oh, I see, with PHP and PLP you always leave the machine how you found it.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 10:32 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
BigEd wrote:
Oh, I see, with PHP and PLP you always leave the machine how you found it.

Yes, although that's possibly not the best example because it doesn't use X/Y and the size of A doesn't matter on entry.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 10:52 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Quote:
In the end, you'll find code setting the computer into 16 bit mode, followed by a bunch of AA, XX, and YY mnemonics, then back to byte mode, with a bunch of A, X, and Y mnemonics. Why am I jump through all those hoops when the assembler already knows what to do?

Because you're writing in assembly, not a high-level language. Assemblers produce exactly the instructions you tell them to, and are not expected to insert extras just because they think they should.

Quote:
Finally, the new mnemonics eliminates any real chance of using legacy code. It all has to be rewritten. You can have a dual mode assembler, you can have folks link legacy code instead of using the same assembler, but it certainly removes any chance of cut and paste reuse.

Actually, because nearly all the mnemonics are both new and different from the standard ones, an assembler could be written to transparently accept the old syntax using the old mnemonics. However, it would then be more difficult to perform the intended mode tracking and checking which is half the point here.


Quote:
…and probably eliminates the need for curly brackets around Direct-Page operands. (To me, having three types of brackets seems inadvisable, especially when square and round brackets denote indirection but curly brackets do not.)

I fear you've misread how the new memory-operand syntax works. There is deliberately no inconsistency between the way the three types of brackets are used - they always refer to referencing an address and loading an operand. Indirect addressing modes get two sets of brackets. The type of brackets merely indicates the type of addressing used for that access:
Code:
Direct Page {} - explicit operand is 8 bits, modified by D (or S) register, bank index always zero.
Absolute () - explicit operand is 16 bits, modified by prepending Data Bank Register.
Long [] - explicit operand is 24 bits, not modified by any offset registers.

Quote:
Is there a way this can be extended to include values in memory? For example, if we increment the accumulator there's no problem because we're obliged to specify A or AA. But I didn't notice any way to make the size apparent when we increment a value in memory. Worth thinking about?

At present I can't think of a way that doesn't complicate the memory-operand syntax, or require duplicating mnemonics. I think the sort of bugs that result from using the incorrect mode here are easier to diagnose than those involving misalignment of the instruction stream. An assembler directive, asserting the mode expected at some point, could be used as a backstop.


Last edited by Chromatix on Wed Nov 21, 2018 11:02 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 11:01 am 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
I would make the single-letter mnemonics into two-letter mnemonics.
I like the way addressing modes are handled, although {} and [] are a pain to enter on a non-US keyboard. That's not so different from C or Perl already of course.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 11:45 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Well, Perl is already acknowledged by many to be so syntactically rich as to potentially accept line noise as valid - I'm sure you've heard the joke about accidentally feeding a megabyte of line noise into a Perl script and discovering that it represents a bug fixed version of Microsoft Office…

On a Nordic keyboard, the three types of brackets are all on the same keys (8 & 9), and accessed by shift, option, and shift-option (on a Mac - I think it's Alt Gr on a PC instead of Option). This is not much different from the effort required to access # on an American English PC keyboard or any English Mac keyboard. On English keyboards, the curly braces are on the same keys as the square ones and accessed via the shift key, which is no more awkward than the round brackets. Unless you're a strict home-row touch-typist, that is; I submit that coding requires a more flexible keyboard technique in any case.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 2:18 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
{} are on 7 and 0 respectively, on a Nordic keyboard. I enter them by holding down AltGr with the right thumb and reaching over with the left hand to enter the characters.
Yes I'm a touch typist, so the above isn't particulary efficient. In that sense I really liked Scott's 'typist assembler', but I prefer your way of handling addressing modes. It's just so awkward with those braces, but that's a universal problem. Standard Pascal introduced (*. and *) as substitutes for {}. Doesn't look very good, but it's faster to type. But it won't work very well in this case because of ({expr+X}) (hm, that would be ((*expr+X*)) - maybe not horrible, and I noticed how much easier it was to type. But combining '*' with expressions could create confusion. [] was '(.' and '.)' in Std Pascal btw)


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 3:33 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
If it ain't broke, don't fix it! The official 65C816 assembly language is fine as is.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 4:07 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Hey, if that's your opinion, you can carry right on using the standard syntax. No skin off my nose. And as far as I'm concerned, the standard syntax still works well for the 65C02, because it doesn't have the complication of Direct Page addressing.

Problem is, *I* think the standard syntax is broken enough to put me off using the '816 entirely. With this redesign, I feel as though I can build a mental model of the CPU that actually works. It still has some pretty ugly warts, but at least I can be more sure of what I'm asking the chip to do - and that combinatorial explosion of transfer mnemonics just vanishes.

With respect to digraphs and trigraphs, I understand those were introduced as a compatibility aid, for computers on which certain symbols couldn't be typed at all. They were never intended to merely make typing easier. Most languages now assume the host computer supports ASCII and has a full English keyboard, which is normally sufficiently close to the truth these days - so these compatibility aids are actively being dropped from newer versions of language standards.

An unfortunate aspect of PC and Mac hardware design is that the keyboard layout is not actually local to the keyboard, so you can't seamlessly have two keyboards, one dedicated to your native language and one for coding or gaming on. That would render many of these layout-specific problems moot.

Controversy over which mnemonics to use is a relatively minor point, I think. People didn't like the two-letter MV I originally suggested, so I minimised it further to M - now, suddenly, two-letter mnemonics are good! But this is probably the easiest part of the assembler to change, or to work around with macros.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 21, 2018 4:30 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
Chromatix wrote:
And as far as I'm concerned, the standard syntax still works well for the 65C02, because it doesn't have the complication of Direct Page addressing.

What complication?

Quote:
It still has some pretty ugly warts, but at least I can be more sure of what I'm asking the chip to do - and that combinatorial explosion of transfer mnemonics just vanishes.

You should know that the 65xx family's assembly language is one of the few that conform to an IEEE draft standard for assembly language mnemonics. Having written 6502 assembly language programs for some 40 years, I find nothing confusing about the choices of mnemonics—other than the PEA, PEI and PER operations on the '816. I understand how they work but feel the mnemonics were poorly chosen (PEA, for example, was "borrowed" from MC68000 assembly language—on the '816, PEA doesn't "push an address").

Quote:
Controversy over which mnemonics to use is a relatively minor point, I think. People didn't like the two-letter MV I originally suggested, so I minimised it further to M - now, suddenly, two-letter mnemonics are good! But this is probably the easiest part of the assembler to change, or to work around with macros.

Three-character mnemonics have the distinct advantage of uniformity, as well as the ability to be reduced to a word-sized value for processing purposes. You have to keep in mind that all of this was carefully worked out long, long ago for some very good reasons.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: