6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Sep 30, 2024 6:27 am

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Mon May 14, 2018 3:01 am 
Offline

Joined: Wed Jun 26, 2013 9:06 pm
Posts: 56
Most people I know write faster code on the 6502 than they do on the 65816, eventhough the 65816 can run 6502 code. Why is this the case?


Top
 Profile  
Reply with quote  
PostPosted: Mon May 14, 2018 4:40 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8407
Location: Midwestern USA
Aaendi wrote:
Most people I know write faster code on the 6502 than they do on the 65816, even though the 65816 can run 6502 code. Why is this the case?

Could it be they don't know the 65C816 instruction set well enough to write programs that take full advantage of the MPU, especially the 16 bit features? Also, are they running the '816 in emulation mode or native mode? In emulation mode, the '816 is going to mostly look like a 'C02, which means the '816 generally won't perform any better than a 'C02 at the same Ø2 clock rate.

Depending on what is being executed and how an algorithm has been structured, the 65C816 in native mode can easily out-perform the 65C02 at identical Ø2 rates, sometimes by a factor of two or three. Using 16-bit operations in native mode, the '816 can fetch and store data at more than twice the rate of a 'C02, assuming identical Ø2 rates. Integer math operations, especially those which produce 32- or 64-bit results, will be done in a lot fewer Ø2 cycles, mainly because fewer loop iterations are required when arithmetic and/or logical operations can be performed 16 bits at a time.

The 65C816's stack relative addressing greatly simplifies programming tasks in which multiple parameters must be passed into and out of a subroutine, often taking a fraction of the code needed with the 'C02 to accomplish the same task. The '816's ability to relocate direct (zero) page adds even more to its flexibility. Helping matters is the '816's 16-bit stack pointer, which greatly assists in using the stack as more than just a place to save return addresses and some registers.

I could go on-and-on, but will say that it's ultimately up to the programmer to know the 65C816 assembly language in detail and to know how to take advantage of the features that are specific to the '816.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon May 14, 2018 5:03 am 
Offline

Joined: Wed Jun 26, 2013 9:06 pm
Posts: 56
Mostly in native mode. 65816 coders typically put more register pushes and pulls at the beginning and ends of subroutines.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 14, 2018 5:55 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
Aaendi wrote:
Mostly in native mode. 65816 coders typically put more register pushes and pulls at the beginning and ends of subroutines.

There's no requirement to do that in good programming. In the times that it's beneficial to do it, like for stack frames, it's far more efficient to do it on the '816 than on the '02; and then you can get a lot of benefits, like re-entrance and local variables. It sounds like the programmers you're referring to might just be using the wrong approach.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon May 14, 2018 7:44 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8407
Location: Midwestern USA
Aaendi wrote:
Mostly in native mode. 65816 coders typically put more register pushes and pulls at the beginning and ends of subroutines.

As Garth said, it all depends on what is being accomplished. If it is necessary to make a subroutine fully recursive and/or completely transparent to the caller then it will be necessary to completely save the machine state, which means pushing all registers and then later restoring them.

Speaking of stack activity, use of a stack frame to pass parameters to and from a subroutine is more of a 65C816 idiom than a 65C02 one, and is one I use when it's not possible to pass all required parameters in the registers alone. So yes, some cycles will also be expended in creating and destroying the stack frame. That is the price to be paid for gaining functionality and flexibility.

In any case, mindlessly pushing and pulling registers when there is no benefit to doing so suggests the programmer is not completely up to speed on best programming practices. My general rule has always been to not push any register that will not be touched inside of a subroutine. The 65C816 slightly complicates things because changing index register sizes can result in data truncation. If the index registers have to be switched from 16 to 8 bits inside a subroutine it may be necessary to preserve them, since the MSBs of the X- and Y-registers will be zeroed.

Incidentally, one who writes software is a programmer, not a coder. :D Coding is something that happens at the hospital when someone goes into cardiac arrest. :shock:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon May 14, 2018 8:32 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
I actually haven't found much 16b '816 source floating around, outside of just examples, and those simply aren't really representative.

I found this: https://github.com/Olde-Skuul/spaceaceiigs

It's the source of Space Ace for the Apple II GS, and it's pretty much pure 16b. It does the SP == DP method of parameter passing. Stuff things on to the stack, then set DP to the top so you can refer to everything via their offsets.

This is a snippet:
Code:
*
* Unpack an animation frame with bank crossing
*

UnpackAnimSlow
:RTSVal = 1
:DestPtr = 3
:UnpackPtr = 7
:EndDirect = 11

   TSC
   PHB
   PHD
   TCD

   SEP   #$20
   PEI   :DestPtr+1
   PLB
   PLB
   LDX   :DestPtr
   LDY   #0
]A   LDA   [:UnpackPtr],Y
   BNE   :NotPack
   INY
   LDA   [:UnpackPtr],Y
   STA   :DestPtr+2
   INY
   LDA   [:UnpackPtr],Y
   INY
]B   STA:   $0000,X
   INX
   DEC   :DestPtr+2
   BNE   ]B
   BRA   :Next

:NotPack   BMI   :NotTab
   REP   #$21
   AND   #$FF
   STA   :DestPtr+2
   TXA
   ADC   :DestPtr+2
   TAX
   SEP   #$20
   INY
   BRA   :Next

:NotTab   AND   #$7F
   STA   :DestPtr+2
   INY
]B   LDA   [:UnpackPtr],Y
   STA:   $0000,X
   INY
   INX
   DEC   :DestPtr+2
   BNE   ]B
:Next   CPX   #$9D00
   BLT   ]A
   REP   #$30
   PLD
   PLB
   PLA
   STA   8-1,S
   CLC
   TSC
   ADC   #8-2
   TCS
   RTS

I really like the style as presented here. Here, we see some local definitions that portray the offsets into the stack frame, then some stack maintenance. What I don't quite grok is the correction stuff at the end to clear the stack.
Code:
   PLA
   STA   8-1,S
   CLC
   TSC
   ADC   #8-2
   TCS
   RTS

I'm pretty sure this is getting the return value properly placed, and then reducing the stack frame. What I'm not sure about is where the "magic value" 8 comes from in this case, given there's at least 11 byte consumed. But what's nice here is that in the end, each routine effectively has their own little piece of "zero page" that they can use, which should perform pretty well. You'll notice also that they don't save any of the work registers, just B and D. Everything else pretty much relies on the values in memory.

I was hoping to find a disassembly of the II GS Roms somewhere to look at, but I haven't been able to find anything.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: GlennSmith and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: