6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 7:15 am

All times are UTC




Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon Aug 25, 2014 12:05 am 
Offline

Joined: Wed Jun 26, 2013 9:06 pm
Posts: 56
I'm curious to how the code density of different CPUs compare. I think the TLSC-900h (Neo Geo Pocket) has some pretty compact code. It is similar to the 68000 in performance and architecture, except that instructions are byte aligned instead of word aligned.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 25, 2014 4:06 am 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Someone posted a chart on that not long ago. Hopefully they or someone else will find it and give us the link to the topic. I can't find it at the moment.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 25, 2014 5:33 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Weaver and McKee have done some comparisons, using their own benchmark. Your newer processor may or may not be on their list. I found it interesting that 6502 and ARM are not-too-distant neighbors in all of their charts on page 4.

http://web.eece.maine.edu/~vweaver/pape ... ensity.pdf

Mike


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 25, 2014 8:23 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
That's probably the paper Garth has in mind. I posted a picture at
viewtopic.php?f=1&t=1888&p=15526#p15526
but unfortunately the image hosting company has dropped the ball. (Back then we didn't have image attachments available as an option)

See also
viewtopic.php?p=15655#p15655

Edit: I resnapshotted the images from the pdf into that previous post.

Cheers
Ed


Last edited by BigEd on Sat Aug 30, 2014 10:56 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 29, 2014 8:49 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Oh, I remember I found this paper a while ago, and I was so happy to see the 6502 included in the comparison.

Also, it seems the 6502 is systematically the worse of it's category when it comes to code density. Not so good news. Our only consolation is that the 6502 is way faster for a fixed clock rate.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 03, 2014 1:01 pm 
Offline

Joined: Wed Jun 26, 2013 9:06 pm
Posts: 56
I'm not too surprised that the 6502 doesn't have the best code density. It takes two instructions to add to the accumulator, and four instructions to add to the index registers.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 03, 2014 6:42 pm 
Offline

Joined: Mon Apr 16, 2012 8:45 pm
Posts: 60
Is the assembly code used in the comparison available? Whittling down a few bytes here and there could improve the score significantly. My understanding from the paper was that they hand coded the programs. The skill of the programmer then becomes important.

There were many surprises in the paper, particularly that big endian is more compact then little endian, even if the difference is small. Correlation with the year the architecture was introduced was amusing.

CRIS did well and looking through the documentation for it I wonder exactly what makes it more compact and also why it seems to be abandoned in favour of ARM.

An update of this paper would be interesting in view of BA2 and the processors from the recently de-cloaked Andes Technology, where I believe they have an 8-bit processor which then would compete with 6502. Also a comparison using SWEET16 would be intriguing.


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 04, 2014 12:47 am 
Offline

Joined: Sun Feb 22, 2004 9:01 pm
Posts: 108
barrym95838 wrote:
Weaver and McKee have done some comparisons, using their own benchmark. Your newer processor may or may not be on their list. I found it interesting that 6502 and ARM are not-too-distant neighbors in all of their charts on page 4.
http://web.eece.maine.edu/~vweaver/pape ... ensity.pdf
That sort-of reflects my experience in coding the same application for different CPUs. The same BBC BASIC interpreter is about 16K in 6502, and about 12K in Z80, about 16K in 80x86. There are similar code size differences in my hand-crafted CRC code. It also suggests that my PDP-11 BBC BASIC interpreter should end up the a comparable size to the Z80 one, which suggests I can look for some optimisation in it somewhere.

Edit: Ooo! The PDP-11 source cites me! :)

_________________
--
JGH - http://mdfs.net


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 21, 2015 9:08 pm 
Offline

Joined: Mon Apr 16, 2012 8:45 pm
Posts: 60
The paper is from 2009. I tried to find updates or source code with no success until I saw this: http://lwn.net/Articles/647636/#Comments with a link to an update. There is a Git repository.

Of course it is possible that the author is a capacity on assembly programming but the below extract suggests there is some room for improvement in coding style for 6502:
Code:
158           ; save zero page
159      ; otherwise we can't return to BASIC
160      
161      ldx   #$e8                         ; we save $E8-$FF
162      ldy   #0
163      lda   #>zp_save
164      sta   OUTPUTH
165      lda   #<zp_save
166      sta   OUTPUTL
167   save_zp_loop:
168      lda   0,X
169      sta   (OUTPUTL),Y
170      inx
171      iny
172      cpy   #$17         ; save 16 bytes
173      bne   save_zp_loop


And there is more like this. 6502 stands at 1130 bytes, Z80 is at 891 bytes. It does smell like a challenge, doesn't it?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 21, 2015 10:03 pm 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
That's more than twice as long as it needs to be for the job.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 22, 2015 3:58 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Good find Alienthe, thanks!

As it's on github, it's very easy to make an edit and then make a pull request.
Here's the code: https://github.com/deater/ll_asm/blob/m ... 502.s#L158
The pencil icon at the top right of the text box makes a temporary fork.
(Of course, you'd need to be logged into github.)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 22, 2015 6:12 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Mr. Weaver knows more assembly languages than I can name, but it appears that he didn't expend much effort on the 6502 version, for either size or speed. In fact, many of his coding techniques (like the zp_save stuff above) look like a crude translation from a different processor's source. I'm sure that optimization priorities differed from processor to processor, and I have a feeling that the 6502 wasn't the only victim here, but dang, that 6502 source is crammed full of sub-optimal code!

I believe that I could get identical results with identical inputs on an Apple 2-anything in under 800 (780!?!?!) bytes, but I have too much on my plate right now to prove that claim in a timely fashion. I'll just add it to my lengthy to-do list, and get to it when I can. Hopefully, someone awesome here can beat me to the punch, because this challenge should not go unanswered.

Mike B.

[Edit: Important footnote from here:
Quote:
* the 6502 results were adjusted to match the code present in other
architectures (i.e., not counting the graphical routines)

There is a lot of low-hanging fruit in the graphics routines, which wouldn't count, but I believe that there is considerable room for improvement elsewhere, which does count.
]


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 22, 2015 11:48 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
I would think that a single index register would do the trick if you adjusted the value of zp_save by $e8. That way the index and the offset could be the same value. If you used a count down to zero you could get rid of the cpy #$17 at the bottom of the loop.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 23, 2015 4:00 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Exactly, Martin ... you're on the right track. And that's just a small example in a much larger source.
Code:
        ; save zero page
        ; otherwise we can't return to BASIC
   
   ldx   #$17                         ; we save $E8-$FF
save_zp_loop:
   lda   $e8,x
   sta   zp_save,x
   dex
   bpl   save_zp_loop

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 23, 2015 7:20 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I've raised an issue: https://github.com/deater/ll_asm/issues/5
(A pull request would have been slightly more friendly!)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron