6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 2:50 pm

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: 65C816 word alignment
PostPosted: Thu Mar 23, 2017 10:48 pm 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 176
I've been looking at the Apple IIgs toolbox. The datatypes and calling conventions require word (2 byte) alignment, which I found strange. Since the '816 has an 8-bit data bus I thought there was no speed advantage in word aligning the data.

In the book I'm looking at the data types are like this:

boolean - 2 bytes
int - 2 bytes (as expected)
pointer - 4 bytes: 1 byte padding, 1 byte bank address, 2 bytes address

The only reason I can think of for this is for some future compatibility, did Apple think that at some future IIgs would have a 16-bit bus?

Or is there some reason I am not aware of for word aligning data on the 816?


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 23, 2017 11:12 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
jds wrote:
I've been looking at the Apple IIgs toolbox. The datatypes and calling conventions require word (2 byte) alignment, which I found strange. Since the '816 has an 8-bit data bus I thought there was no speed advantage in word aligning the data.

In the book I'm looking at the data types are like this:

boolean - 2 bytes
int - 2 bytes (as expected)
pointer - 4 bytes: 1 byte padding, 1 byte bank address, 2 bytes address

It sounds as though you are describing use of a C compiler. In pure 65C816 assembly language there is no word alignment requirement. However, you do have to be aware of the fact that the PEA, PEI and PER instructions push 16 bit data sizes to the stack. As those instructions are often used to generate stack frames for calling subroutines (aka functions) the word alignment requirement may be the result of some peculiarity of the compiler.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 24, 2017 1:45 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
jds wrote:
Since the '816 has an 8-bit data bus I thought there was no speed advantage in word aligning the data.
You're right -- with '816 alignment yields no speed advantage.

Quote:
boolean - 2 bytes
int - 2 bytes (as expected)
pointer - 4 bytes: 1 byte padding, 1 byte bank address, 2 bytes address

Hmmm, yes... but aren't you reading too much into this? These comments actually don't mention alignment! For example it doesn't say that, in memory, a 2-byte value must begin at an even address (one whose least-significant bit is zero) and end at an odd address (one whose least-significant bit is one).

It does mention padding, and maybe that's where the confusion arises. In other circumstances (such as when a 16-bit data bus is present) , padding is used as a means to preserve alignment -- and the result is better performance. So, in that context, you'd be right on the money guessing that perhaps a future IIgs would have a 16-bit bus. And a happy thought it is! :)

But all I get from these comments is they want 24-bit address pointers to be extended to 32 bits (by the use of a pad byte). One possible reason is that it makes certain common operations substantially quicker.

  • If the pointer has 32 bits then it can be updated with two 16-bit write operations. (Memory/Accumulation would be 16-bit -- IOW the m bit in the 816's status register would be clear.)

  • If the pointer has 24 bits then an update would require both an an 8-bit write and a 16-bit write. IOW every pointer update would require the m bit to be set then cleared again.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Fri Mar 24, 2017 1:48 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 24, 2017 1:47 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 674
Everything is made to be an even multiple of the 16-bit integer. It can treat a pointer as 2 integers without any weird overlap masking for the 3rd byte.

Plus, if you have an array of any of these objects, the offset can be calculated by shifting, instead of multiplying by 3 for long pointers for instance.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 12:04 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 176
I guess it makes sense to keep the processor in 16-bit mode all the time, it's less work to push 2 bytes on the stack when you only need one (for a boolean) than using REP and SEP instructions to go back and forward. So that makes sense. The one that is a bit harder to understand is making pointers 4 bytes. Pushing the DB to the stack will always push one byte, so to expand the pointer you'd need to first push a 1 byte $00 (or possibly any convenient byte value as it's ignored), I haven't found a way to push a $00 to the stack when in 16-bit mode, so I guess that the easiest thing to do would be to push the DB twice. This probably somewhat removes the future compatibility advantage of the 32-bit pointers as you are storing invalid data in the first byte.

So to push a pointer for a system call you'd need something like this:

Code:
  PHB             ; Push a meaningless byte
  PHB             ; Push the data bank for real
  PEA Buffer   ; Buffer is a data label
  JSL SysCall   ;


I've found some other IIgs references and they often make use of macros for pushing various values. I didn't find a push of a pointer which is what I was interested in, but there are macros for pushing 1,2,3, and 4 byte values. PUSH3 is quite interesting, it expands like this:

Code:
PUSH3 $123456

LDA #$1234    ; Push high 2 bytes
PHA                 ;   onto stack
PHB                 ;  Push an extra byte on the stack
LDA #$3456    ; Store lower two bytes
STA 1,S            ;   directly onto the stack


So the middle byte is stored twice, but it's still the easiest way. This really shows how the stack can be much more useful than on a 65C02 with then new '816 instructions. The macro also uses the PHB trick to add a single byte to the stack, so that confirms that this is the best way to do this if you don't care what the value is.

The macro to push a byte uses SEP and REP, as that is the only way to push a single byte onto the stack when you care what the value is.

So I think that on the IIgs there is no attempt at word alignment or word padding in general, but that data pointers are stored as 4 bytes for some unknown reason but probably so that high level language support is easier.

I've found this interesting because the IIgs operating system is probably the largest body of 65C816 code that was written, so there are probably many things that could be learned from it. A few years ago the source code for GS/OS 6.0.1 (and I think the ROM as well) was leaked, but alas I've had no luck finding a copy on the net now. It would be very interesting to read.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 3:00 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
I wonder if it would make sense to write a nice letter to Apple and say, hey guys, we don't think you're going to use the 65C816 anymore in future projects, might we trouble you to release the source code for us curious hobbyists?


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 4:54 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
jds wrote:
The one that is a bit harder to understand is making pointers 4 bytes. Pushing the DB to the stack will always push one byte...

Pushing DB is unnecessary for stack relative operations, if that is what you are doing. It's also unnecessary if an arbitrary application is calling a function that needs to know in which bank the application is reading and writing data. I'll let you think about why this would be true.

Quote:
...so to expand the pointer you'd need to first push a 1 byte $00 (or possibly any convenient byte value as it's ignored), I haven't found a way to push a $00 to the stack when in 16-bit mode, so I guess that the easiest thing to do would be to push the DB twice.

You could try something like the following:

Code:
;   write PB on stack as a word...
;
         pea #0                ;write $0000 to SP & SP-1
         tsc                   ;get SP
         inc a                 ;SP = SP + 1
         sei                   ;ignore IRQs
         tcs                   ;set new stack pointer
         phb                   ;push DB
         cli                   ;enable IRQs

The above code assumes the accumulator is 16 bits wide. When executed, whatever is in PB will be at SP+1 and $00 will be at $SP+2. If PB were $21 when the above was executed and you were to later execute LDA $1,S (assuming the accumulator remains in 16 bit mode) the accumulator would contain $0021.

Another method would be as follows:

Code:
;   push PB as a word to stack...
;
         ldx #0                ;assumes .X is 8 bits
         phx                   ;push it
         phb                   ;push DB

The above accomplishes the same thing as the previous code fragment, but is predicated on the index registers being 8 bits wide and you being willing to clobber .X (or .Y).

Quote:
So to push a pointer for a system call you'd need something like this:

Code:
  PHB             ; Push a meaningless byte
  PHB             ; Push the data bank for real
  PEA Buffer      ; Buffer is a data label
  JSL SysCall
...I've found this interesting because the IIgs operating system is probably the largest body of 65C816 code that was written, so there are probably many things that could be learned from it.

Something to ponder is when the IIgs was in production the 65C816 was very new and thanks(?) to the poor quality of documentation that came from WDC in those days, was not well understood, even by Apple's software engineers. It wasn't until the Eyes & Lichty programming manual was released in 1986 that significant information became available. I would not consider the IIgs operating system to be a best-case example of a source of information on the '816. Things have changed a bit in the last 30 years, y'know. :wink:

Incidentally, a better (in my opinion) way of implementing system calls (aka kernel calls) with the '816 is to use the COP software interrupt. This method, as opposed to using JSR or JSL via a jump table, has the distinct advantage of not exposing kernel memory directly to user applications. Also, user applications only have to know API index numbers to make the call, not actual addresses. See this web page for some information on how to go about it.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 4:54 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
scotws wrote:
I wonder if it would make sense to write a nice letter to Apple and say, hey guys, we don't think you're going to use the 65C816 anymore in future projects, might we trouble you to release the source code for us curious hobbyists?

:lol: :lol: :lol: :lol: :lol:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 5:50 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
jds wrote:
I guess it makes sense to keep the processor in 16-bit mode all the time, it's less work to push 2 bytes on the stack when you only need one (for a boolean) than using REP and SEP instructions to go back and forward. So that makes sense.
Yup -- and not just with a Boolean and not just on stack. Likewise in general memory it's easier to load or store a 24-bit pointer as two 16-bit words (including one unneeded byte) than it is to load or store a word and a byte.

The same applies with arithmetic. It's reasonable to suppose you'll commonly want to adjust a 24-bit pointer by adding to it the size of a field or object. And it's easier to do two 16-bit additions (including one unneeded byte) than it is to do a 16-bit addition and an 8-bit addition. (This assumes the size is 24-bit. But smaller sizes can reasonably be done with two 16-bit additions, too.)

jds wrote:
The one that is a bit harder to understand is making pointers 4 bytes. Pushing the DB to the stack will always push one byte, so to expand the pointer you'd need to [...]
To me it seems 4 bytes just flows better. Moreover, the opposing considerations don't amount to much. For example I doubt there's much need for pushing and popping DBR. (But I like your solution of an extra push of DBR as a way to consume an extra byte on stack!)

Here's one way a routine can use a pointer -- or multiple pointers -- passed to it via the stack. Notice that DBR is not involved. Also, there's not any need to convert a 4-byte pointer to 3-byte.
Code:
PHD        ;save DirectPage register
TSC        ;Copy stack pointer...
TCD        ; ... to the DirectPage register

           ;*** example instructions follow ***
LDA [3]    ;use "direct page indirect long" to LDA via the long pointer
( etc.)


jds wrote:
I've found this interesting because the IIgs operating system is probably the largest body of 65C816 code that was written, so there are probably many things that could be learned from it.
I agree. Thanks for posting that PUSH3 macro, BTW.

BigDumbDinosaur wrote:
Something to ponder is when the IIgs was in production the 65C816 was very new and thanks(?) to the poor quality of documentation that came from WDC in those days, was not well understood, even by Apple's software engineers.
@BDD: I heartily agree that WDC has published a lot of poor quality doc! But would Apple's software engineers have relied on WDC doc?

In regard to the 816's upgraded programming model (compared to the '02), I kinda thought the creative direction came from Apple. IOW I thought Apple dictated to WDC how the '816 needed to work. I could be mistaken. But maybe it was more a case of WDC relying on Apple doc!

Does anyone have any sources regaring the degree of influence Apple had on the '816 design? At the very least I know WDC made a major revision because the initial '816 wasn't to Apple's liking. (IIRC this had to do with Emulation mode. There's a forum post somewhere -- I'll try to hunt it up.)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 6:03 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
Dr Jefyll wrote:
@BDD: I heartily agree that WDC has published a lot of poor quality doc! But would Apple's software engineers have relied on WDC doc?

I suppose Apple would have had to rely on what WDC claimed were the 65C816's capabilities. Of course, once Apple had a working piece of hardware, some enterprising experimentation would quickly reveal a lot.

Quote:
In regard to the 816's upgraded programming model (compared to the '02), I kinda thought the creative direction came from Apple. IOW I thought Apple dictated to WDC how the '816 needed to work. I could be mistaken. But maybe it was more a case of WDC relying on Apple doc!

The '816 came about after Bill Mensch had consulted with Apple on a "next generation" Apple ][. However, what I have learned about the genesis of the '816 over the years doesn't really reveal a lot about how much influence Apple had on the final product, other than the 16 bit features. Some of the 65C816's instructions were clearly borrowed from the MC68000 (PEA immediately comes to mind), the latter which was already in use in the MacIntosh. Again, whether Apple had anything to do with the inclusion of such instructions in the '816 is unclear.

Quote:
Does anyone have any sources regaring the degree of influence Apple had on the '816 design?

Over a period of time I have tried to ferret out such information, with little success.

Quote:
At the very least I know WDC made a major revision because the initial '816 wasn't to Apple's liking. (IIRC this had to do with Emulation mode. There's a forum post somewhere -- I'll try to hunt it up.)

You may be thinking about bus behavior associated with the generation of invalid addresses. I seem to recall that the Apple ][ floppy disk operation was dependent in some way on the bus behavior.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 9:07 am 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
BigDumbDinosaur wrote:
Dr Jefyll wrote:
Quote:
At the very least I know WDC made a major revision because the initial '816 wasn't to Apple's liking. (IIRC this had to do with Emulation mode. There's a forum post somewhere -- I'll try to hunt it up.)

You may be thinking about bus behavior associated with the generation of invalid addresses. I seem to recall that the Apple ][ floppy disk operation was dependent in some way on the bus behavior.
I'm pretty sure that was the issue. WDC had to back out what they considered an improvement, because the Apple FD relied on that quirk to get the timing right.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 10:16 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 176
In my search for the aformentioned os source code I did find a small excerpt, the thing that stood out for me was the 'mensch' comments. It appears as if Bill Mensch wrote parts of the IIgs OS code.

Code:
* 2/25/89        Mensch
* Added a faster inline multiply just for grins. Speed this routine up
* quite a lot and seems to complement those new faster rectangles pretty
* well.
...
006E9 0034                                ; so, we put back the old multiply routine so we can ship the damned disk!
006E9 0034                                ; Mensch 6/13/89


and just out of interest, here is the multiply routine that was commented out for not working with large values:

Code:
006F6 0041                                ;
006F6 0041                                ; Fast multiply routine, just in case I need it for something!
006F6 0041                                ;
006F6 0041                                ;        ldx #0        ; init our carry over area
006F6 0041                                ;        lda #0
006F6 0041                                ;@dm0010        lsr Mult1        ; shift for first test
006F6 0041                                ;        beq @dmLastOne        ; handle the last multiply
006F6 0041                                ;        bcc @dm0020
006F6 0041                                ;        clc
006F6 0041                                ;        adc Mult2
006F6 0041                                ;        bcc @dm0020        ; if carry clear we didn't role the bank yet..
006F6 0041                                ;        inx        ; if carry set, role the bank!
006F6 0041                                ;@dm0020        asl Mult2
006F6 0041                                ;
006F6 0041                                ; Bug fix here...
006F6 0041                                ;        bcc @dm0010
006F6 0041                                ;        inx
006F6 0041                                ;        bra @dm0010
006F6 0041                                ;@dmLastOne        bcc @DoMultDone
006F6 0041                                ;        clc
006F6 0041                                ;        adc Mult2
006F6 0041                                ;        bcc @DoMultDone        ; carry clear no bank role
006F6 0041                                ;        inx
006F6 0041                                ;olddoMultDone        phx        ; push the high word
006F6 0041                                ;        pha        ; and the low word
006F6 0041                                @DoMultDone


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 7:30 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
Tor wrote:
BigDumbDinosaur wrote:
Dr Jefyll wrote:
At the very least I know WDC made a major revision because the initial '816 wasn't to Apple's liking. (IIRC this had to do with Emulation mode. There's a forum post somewhere -- I'll try to hunt it up.)
You may be thinking about bus behavior associated with the generation of invalid addresses. I seem to recall that the Apple ][ floppy disk operation was dependent in some way on the bus behavior.
I'm pretty sure that was the issue. WDC had to back out what they considered an improvement, because the Apple FD relied on that quirk to get the timing right.

As Jeff eluded, an area where Apple would have had some influence would have been the development of the 65C816's emulation mode. Behavior in emulation mode is "almost" like that of the 65C02, but has elements of NMOS 6502 behavior as well, one of the artifacts being the address bus behavior during intermediate cycles of some instructions. I believe this paradoxical combination would have been provided to assure proper Apple disk drive operation, as well as improve compatibility with older Apple ][ software written prior to the introduction of the 65C02.

During the period of time when I was running POC V1.0 in emulation mode I became convinced that emulation was a kludge meant to satisfy a specific set of requirements. In most cases, application software developed for the 6502 or 65C02 can be run unchanged on a 65C816 in native mode, as long as no undefined opcodes were used. System software, of course, would have to be somewhat different to account for the native mode hardware vectors and the 65C816's stack behavior when processing an interrupt. However, that would not affect user applications in the usual course of events.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 8:05 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
jds wrote:
In my search for the aforementioned OS source code I did find a small excerpt, the thing that stood out for me was the 'mensch' comments. It appears as if Bill Mensch wrote parts of the IIgs OS code.

There is no definitive record of WDC having written software specifically for the ][gs. The ][gs was released for sale in September 1986, however the 65C816 was released for sale in the summer of 1984. There would have been considerable pressure on WDC at the time of release to win some sales on the new MPU in order to recover the costs associated with developing it. It is known that samples were given to Apple, Atari and Acorn Computers, the latter whom sold a machine called the Communicator that was powered by the '816. Ricoh may have also been given samples, given that the Super Nintendo game console used a Ricoh 5A22 MPU that was mostly a 65C816 in function. More likely, Bill Mensch and/or one of his employees wrote demonstration software to go along with the engineering samples they handed out in order to highlight the 65C816's capabilities. Apple probably incorporated some of that code into the finished ][gs operating system as a matter of convenience and expediency.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 27, 2017 9:40 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Hmm, if we're talking about how a caller routine ought to pass a long pointer to a callee then we need to consider how the callee will use it. IOW how do we do long accesses?

When I first started learning about the '816 it seemed to me DB (the Data Bank Register) was the key to long accesses. Later I realized manipulating DB isn't the only method -- and often it's far from being the best. Unless I'm missing something PLB is the only instruction capable of loading PBR, and that's not necessarily convenient. A more serious problem is there's only one DB, which means you're faced with a real headache if the task at hand requires you to move or compare data in multiple banks.

The alternative is to forget about fussing with DB and instead use either Direct Indirect Long address mode or Direct Indirect Long Indexed mode. To use these you can copy the pointer(s) to Direct Page (if you don't mind non-reentrant and somewhat inefficient code). Or, as shown in the snippet I posted above, just tell the Direct Page to be where the pointers are!

Regarding Apple's influence on the '816 design...
BigDumbDinosaur wrote:
You may be thinking about bus behavior associated with the generation of invalid addresses. I seem to recall that the Apple ][ floppy disk operation was dependent in some way on the bus behavior.
Tor wrote:
I'm pretty sure that was the issue. WDC had to back out what they considered an improvement, because the Apple FD relied on that quirk to get the timing right.
Thanks, guys -- you're right about the FDC. I found the thread I was looking for, and it sounds as if the situation was pretty dire. :| Summary here.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: