6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 6:25 pm

All times are UTC




Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Tue Apr 25, 2017 4:35 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
There's a thread over on Stardot where with some cooperative effort, some few hundred bytes are shaved off a 16k ROM program, to make room for more features.

There are several examples given of specific changes, including
- making use of 65C02 opcodes such as BRA and STZ
- changing JSR then RTS to a JMP
- changing JMP to BRA
- re-ordering routines to allow fall-through instead of JMP or BRA
- pulling common code into subroutines or jumping to common postambles
- inlining subroutines used only once
- rolling up repetitive code into loops
- reorganising loop code to use fewer operations
- removing unnecessary operations
- removing dead code
- making some not-so-important features conditionally included in the assembly

There may be nothing very new in there for experienced 6502 coders, but it might be worth a look. I know I enjoyed reading it through! If there are some more tactics not listed above which merit a mention, feel free to discuss them in this thread.

Also notable is the use of some automatic analysis to find opportunities - not quite automatic peephole optimisation but heading in that direction. (In this case a couple of Python scripts.)

Perhaps this is a good place to join the thread:
http://stardot.org.uk/forums/viewtopic. ... 44#p131544
There are four flavours of the ROM being built - it's a filesystem for the BBC Micro, with drivers for one or more of several devices: floppy, IDE drive, bit-banged SD Card.

Another possible way to view the changes is to review the commits made in git.
https://github.com/ZornsLemma/ADFS/commits/master
or possibly this branch:
https://github.com/ZornsLemma/ADFS/commits/scratch2
(You'll also see changes which improve the organisation, the labelling, or the commenting of the source, which is ultimately derived from a disassembly of an Acorn ROM.)

(The source format used is BeebAsm, which allows for structured assembly coding. It would be nice if we could refrain from discussing syntax preferences in this thread! You might need to know that & introduces a hex number in this dialect.)

Hat-tip to hoglet for bringing this thread to my attention.


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 25, 2017 8:19 pm 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 145
Location: Lake Tahoe
For certain types of code, something like Sweet16 can really save bytes. Nothing beats adding another layer of abstraction.


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 25, 2017 8:49 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Something like Sweet16 come into my mind as well :)

I've onced squeezed 4 char into 3 bytes using caps-only 6b ASCII. But I haven't look at the source code, whether there are enough messages to have some refund for this.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 26, 2017 1:46 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
GaBuZoMeu wrote:
I've onced squeezed 4 char into 3 bytes using caps-only 6b ASCII. But I haven't look at the source code, whether there are enough messages to have some refund for this.

3:2 encoding of 6502 mnemonics in machine language monitors has been used almost as long as the 6502 has existed. The technique reduces a three-character mnemonic into 15 bits. "Characters" in this case means letters of the alphabet, not numerals, punctuation and whitespace. Reversing the process yields the uppercase form of the mnemonic. I use this technique in Supermon 816. The technique could be extended to compress four characters into 20 bits, five characters into 25 bits, etc.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 26, 2017 7:01 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
You might be interested at this (partly written by me): http://wiki.nesdev.com/w/index.php/6502_assembly_optimisations

You pretty much summed everything up, but be careful:

Quote:
- re-ordering routines to allow fall-through instead of JMP or BRA

Unfortunately this will likely end up with spaghetti code, as the different routines will not be ordered in an "human-friendly" order. Also you seem to assume 65C02 instructions are available - it's great if they are available to you but when doing NES development they're not so I'm not used to this.

Also you forgot to mention an optimisation I use all the time, use ADCs and SBCs when you know the state of the C flag as much as possible. Normally you need "C" to be clear for addition and set for substractions, but for immediate values this is not needed as you can compensate the immediate for it if C is the other way around. I sometime even move code so that an addition is done somewhere where "C" flag status is known.


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 26, 2017 8:44 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Good point - knowing the state of one or more flags can be very helpful, especially on an '02 which doesn't have BRA.

It just happens to be true with this project, that although the Beeb is an '02 machine, the project is for the Master (a successor machine) which is a 'C02 machine.

One thing which can go wrong, of course, is that in the process of reordering some branches can go out of range and turn back into JMPs.

You'll notice at one point in the thread the BIT trick is used to get some bytes to do double-duty, but the consensus was that this is too tricky a trick to be retained!

Another tactic to note: once in a while the code is found no longer to be working, and so bisection in the change control history is used to find the breaking change.

Another thing to note is that just a few parts of the code are timing critical, and need to be not too slow, or even not too fast.


Last edited by BigEd on Fri May 05, 2017 6:23 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 26, 2017 11:29 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
And another favorite, even when BRA is unavailable in NMOS platforms, is to convert JMPs to conditional branches, by analyzing known states of the various CPU flags. This sometimes involves reordering operations so that the last one leaves a flag in a known state.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri May 05, 2017 7:38 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Bregalad, do any of the Nintendos use 65c02? If so, it would be good to add a section at the bottom of that page for CMOS. For example, on that forum (NesDev), I keep seeing CLC, ADC #1 where INA could be used which is 1/3 as many bytes and half as many cycles, and LDA #0, STA where STZ could be used. Another one to add even to the NMOS part is removing the CMP #0 after anything affecting the accumulator, or the CPY #0 after anything affecting Y, or the CPX #0 after anything affecting X.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri May 05, 2017 3:31 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
The original NES just had the NMOS instruction set, sans BCD support. It was second-sourced through Ricoh, and as part of a low-level console platform, they didn't muck about with the ISA in the various hardware revisions.

The Game Boy was z80-ish. The SNES had a 65816, so that would be the first one that would subsume the 65c02 instructions, unless there's some smaller platform or maybe expansion cart I missed.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Sat May 06, 2017 12:59 am 
Offline

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 282
Location: Placerville, CA
I believe the NEC PC-Engine/Turbografx-16 used a 65C02 derivative, though.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 06, 2017 6:02 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
commodorejohn wrote:
I believe the NEC PC-Engine/Turbografx-16 used a 65C02 derivative, though.

Yes -- the Hudson Soft HuC6280.

The Wikipedia article says, "The HuC6280 contains a 65C02 core which has several additional instructions and a few internal peripheral functions such as an interrupt controller, a memory management unit, a timer, an 8-bit parallel I/O port, and a programmable sound generator. [...] The HuC6280 has a 64 KB logical address space and a 2 MB physical address space."

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat May 06, 2017 8:24 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
We've mentioned the 6280 here before, although I'd forgotten it.

Quote:
The [PC-Engine] uses the Hudson Soft 6280, a 6502 variation which we have discussed before. It's got a few extra opcodes and a 'T' mode which allows for memory-memory operations.


Previously, Jeff, you were quite excited about the T prefix!

Quote:
This is a capability that really made me sit up and take notice! Sure, the other new features (eg, instructions such as SAX, SAY and BSR) are undeniably cool, but the T Flag is a fundamental improvement, IMHO. Much though we all love the 65xx architecture, the Accumulator has always represented somewhat of a bottleneck in that nearly all arithmetic and logical operands must pass through it -- ie, be loaded beforehand and stored afterward. I'm really intrigued and impressed at how the HuC6280's T Flag manages to transcend that barrier... and yet, the scheme doesn't "break" anything in the pre-existing 65xx architecture. Pretty darn innovative, I'd say!


There's an opcode matrix here.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 06, 2017 3:53 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigEd wrote:
Previously, Jeff, you were quite excited about the T business!
Hah! I hadn't forgotten about the T prefix. But my bass and I did a bar gig last night -- fun, but nothing fancy -- and I was pooped by the time I (we) got home. :) That's why there's not much detail in my post. Thanks, Ed, for tracking down the supplementary info.

The Mitsubishi/Renesas 740 microcontroller family also features the 'T' mode for for memory-memory operations. Dieter posted a Renesas PDF here.

The 740 family treats 'T' as a mode -- you use the SET and CLT instructions to turn it on and off. But with Hudson Soft's chip the T thing is transitory. The SET instruction turns it on, and it turns itself automatically after the following instruction. So, SET is used as a prefix.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat May 06, 2017 4:02 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Thanks for the pointer to that thread Jeff. I see the STM8 gets a mention there. I think it's quite interesting to see some concrete choices that various outfits have made, and actually taken to market. It's very easy to speculate about a wider stack pointer, or a prefix byte, but here we see these things actually being done. Even so, we can't tell how attractive the result is, without working with that device.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 08, 2017 7:22 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
GARTHWILSON wrote:
Bregalad, do any of the Nintendos use 65c02?

No.
Quote:
For example, on that forum (NesDev), I keep seeing CLC, ADC #1 where INA could be used which is 1/3 as many bytes and half as many cycles, and LDA #0, STA where STZ could be used.

That's because they couldn't (and can't) be used.

Quote:
Another one to add even to the NMOS part is removing the CMP #0 after anything affecting the accumulator, or the CPY #0 after anything affecting Y, or the CPX #0 after anything affecting X.

This is quite obvious, not even an "optimisation".


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 20 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: jgharston and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: