6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Sep 17, 2024 3:12 am

All times are UTC




Post new topic Reply to topic  [ 39 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Fri Dec 29, 2017 11:23 am 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 284
jsii wrote:
rwiker wrote:
If you're going to cheat, why not move the
Code:
LDX #$09
out of the function, too? Further, if you inline this, you can get rid of the RTS (and everything else; the actual length of the subroutine becomes 0...)
I hear that bile is good around the liver. Anyway, I'm not sure what you mean by 'cheat', by 'function' or by 'inline' for that matter. I wonder how a 6502 feels about cheating, about "well behaved" code, about pretty comments or about nifty indenting altogether. Cheers.


By "cheat" I mean that you made the subroutine (or function) smaller by moving code to the caller. If you use the subroutine only once, the total code bytes is the same as for the previous version, and if you use it from several places with the same bit length, you actually use two code bytes more each time.

I think that 27 bytes for a 32x32->64 multiply is pretty good, so it will be interesting to see if you or one of the usual suspects around here can improve it further!


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 29, 2017 2:50 pm 
Offline

Joined: Wed Mar 16, 2016 8:19 pm
Posts: 38
rwiker wrote:
jsii wrote:
rwiker wrote:
If you're going to cheat, why not move the
Code:
LDX #$09
out of the function, too? Further, if you inline this, you can get rid of the RTS (and everything else; the actual length of the subroutine becomes 0...)
I hear that bile is good around the liver. Anyway, I'm not sure what you mean by 'cheat', by 'function' or by 'inline' for that matter. I wonder how a 6502 feels about cheating, about "well behaved" code, about pretty comments or about nifty indenting altogether. Cheers.
By "cheat" I mean that you made the subroutine (or function) smaller by moving code to the caller.
It got smaller that way because I'm trying to improve it (of course), but I only chose it that way because in fact it does add flexibility and strength to the routine, by allowing it to perform not only 32 bit operations, but at least in theory any number of bits operations from one same mechanism (at least from 1 bit right up to 32 as far as I see it yet), with an obvious gain in speed with each bit difference set. Otherwise most likely I wouldn't have 'moved the code' at all. About this 'variable bit' approach, I haven't tried it thoroughly myself yet, other than briefly and more or less only for the values I gave above; but I'm almost sure this routine can very easily perform 1 bit, 2 bit, 3 bit, n bits up to 32 bit multiplications quite correctly. I had a somewhat similar one back in the day and it worked like a charm; I don't see why this one wouldn't work now too. If such a routine is able to gain this flexibility only by 'moving out' two bytes (which don't need to be moved out at all if you think about it carefully, by the way), then I say that's a clear advantage and well worth 'the trouble'.

Quote:
If you use the subroutine only once, the total code bytes is the same as for the previous version, and if you use it from several places with the same bit length, you actually use two code bytes more each time.
Depends on how it's done. If you use it once and the total size is the same, then nothing is lost and the flexibility still remains (potential in ROM). If you use it with the same bit length from several different places and writing the same code each time, the (user) program in itself may be redundant. From ROM memory's point of view and for added flexibility and strength, this 25 byte approach may make sense. Since this is just a 'mind' exercise of sorts, anyway, I'm mostly trying to play around with the possibilities when looking to improve both byte count and speed, for a bit of 'fun', while also trying to see at the same time if something new can be learned along the way.

Quote:
I think that 27 bytes for a 32x32->64 multiply is pretty good, so it will be interesting to see if you or one of the usual suspects around here can improve it further!
'The usual suspects', the name of some movie isn't it? I'm quite certain this routine can be made better. I think a 21 byte improvement would be good. Cheers.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 29, 2017 3:27 pm 
Offline

Joined: Sat Jul 28, 2012 11:41 am
Posts: 442
Location: Wiesbaden, Germany
jsii wrote:
I still say there must be better ways to do this yet both in terms of byte count and speed. 21 bytes? Cheers.
You may not have noticed, but shorter code contradicts faster code because of the overhead the additional loops have over linear code. Of course this is not true for unnecassary overhead like saving carry, rotating the multiplier independant of the result or rotating the multiplicand and having to add with the width of the result. But we have taken care of that, haven't we?

And yes, changing the rules (even if they are your own) is cheating.

_________________
6502 sources on GitHub: https://github.com/Klaus2m5


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 29, 2017 9:29 pm 
Offline

Joined: Wed Mar 16, 2016 8:19 pm
Posts: 38
Klaus2m5 wrote:
jsii wrote:
I still say there must be better ways to do this yet both in terms of byte count and speed. 21 bytes? Cheers.
You may not have noticed, but shorter code contradicts faster code because of the overhead the additional loops have over linear code.
Not necessarily.

Quote:
Of course this is not true for unnecassary overhead like saving carry, rotating the multiplier independant of the result or rotating the multiplicand and having to add with the width of the result. But we have taken care of that, haven't we?
I'm not sure who "we" is, but as far as I'm concerned you took care of all that yourself. It was you who let the cat out of the bag, after all, not I. All I did was write a bit of my own (inefficient) code here and there (without copy-pasting anything, mind you), while at the same time trying to enjoy the ride all along the way. I think I've done it too, also by seeing better derivations and improvements over the process from sharp posters like yourself.

Quote:
And yes, changing the rules (even if they are your own) is cheating.
'cheating' according to whom, and which rules are those exactly and where can I find them? I'd like to read them, thank you.

On a side note, if what I've just perceived here is correct, you guys should lighten up around here, it's healthier that way.
Dogs bite less, and it all kind of brings back to me why I tend to avoid forum posting altogether; kind of like avoiding (most-if-not-all) 00p cultists wherever I spot them. If I'd ever want this kind of response to the posting of a simple routine, I could always go to a c++ or java board, to gamedev boards or to (gasp) Usenet itself, where arrogance looms aplenty. That sort of thing is "the last" thing I'd expect from '6502 people' (or so I've always liked to think). Yeah well, maybe I've been wrong all along, no sweat. So if this is the way you guys have fun around here, I dread to see what it is like when you're all bored. But don't fret it. I don't take any of it personally, why would I anyway?

And speaking of solutions, I'd particularly like to see your 21 byte solution. I'm always interested to learn more how to count cycles, to add, to avoid saving carry and all that. And once again, "let's" 'all of you' keep it well in mind I'm just a beginner here, doing little more than hopefully trying to learn something good and new from all-you-the-best. I'm not a cool expert like yourselves, so don't take any of it personally against me. So I say point your guns elsewhere, why don't ya? Like in the 'right' direction, know what I mean? I think we're all facing the same kind of foe here, and this is just a bit of programming for me as far as I'm concerned, not some kind of weird wit or drama contest. So yeah, Cheers.

(And I do have a 21 byte solution already, by the way, but this being the mood around here, I'd like to wait until you or someone else takes the lead this time around and posts his solution first instead. Once you do it, though, be sure that, rules or no rules, I'll be pushing for a 19 byte solution next (of course)).


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 29, 2017 9:36 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
> "you guys"

Not a good idea to generalise - there are hundreds of people here on the forum, and even if you got into some kind of disagreement with half a dozen, it needn't mean that you have figured out the hive mind. There is no hive mind!

Best tactic if you perceive hostility is to ignore it. You might have misinterpreted, and it doesn't matter anyway.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 29, 2017 9:53 pm 
Offline

Joined: Wed Mar 16, 2016 8:19 pm
Posts: 38
BigEd wrote:
> "you guys"

Not a good idea to generalise - there are hundreds of people here on the forum, and even if you got into some kind of disagreement with half a dozen, it needn't mean that you have figured out the hive mind. There is no hive mind!
Hive mind? It's not 'the borg' here, or is it? "You guys" is an informal way of addressing the posters in question here on this thread alone. The meaning of those words depends on how the words are read. My meaning is calm and clear, and yet those looking for negative meaning will always tend to see it that way not only in those words, but in many others. I know little or nothing about the group so no, I'm not generalizing outside of this thread. I'm learning quickly here, though.

Quote:
Best tactic if you perceive hostility is to ignore it. You might have misinterpreted, and it doesn't matter anyway.
My best tactic against any sort of hostility is to defend myself but once again, this is about programming, for me it's not personal, and I did read what I read.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 2:37 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Hostility has bitten us here in the past, several times, regrettably. It is somewhat difficult to "feel the tone" of a participant in plain text, especially in the case of English being a second or even third language for some of us. I am very comfortable here, and I think that it's due to a combination of a thick skin and a playful, lighthearted and helpful attitude, with just a pinch of mischief. They have served me well so far ...

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 6:32 pm 
Offline

Joined: Tue Nov 10, 2015 5:46 am
Posts: 228
Location: Kent, UK
Isn't it interesting that after 41 or so years, people still find fun in implementing MUL, and finding ways to optimize. Similar, I think, to the posts we see from time to time on a new Forth NEXT primitive, or a new sort function, or in the other thread, DEX2HEX. The 6502 is simple and constrained, yet still possesses lesser known behaviors in the dark corners, e.g. around decimal mode, that keep people engaged. It's great!

Happy New Year everyone. Keep hacking.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 7:29 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
sark02 wrote:
Isn't it interesting that after 41 or so years, people still find fun in implementing MUL, and finding ways to optimize.
Yup! :) And sometimes the "optimization" involves upgrading the code to eliminate incorrect results! :shock: :P If anyone's interested, see Garth's thread (below).

UM* (multiplication) bug in common 6502 Forths



Later in the thread Bruce aka dclxvi posts a speedup, and FWIW after that I posted an animation and a comparison of several versions including a speedup to the speedup. (Apologies for going slightly OT -- these are 32 bit multiplies, but not especially simple.)

Happy New Year, all!
Jeff

Image

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 8:22 pm 
Offline

Joined: Wed Mar 16, 2016 8:19 pm
Posts: 38
barrym95838 wrote:
Hostility has bitten us here in the past, several times, regrettably. It is somewhat difficult to "feel the tone" of a participant in plain text, especially in the case of English being a second or even third language for some of us.
Quite true.

Quote:
I am very comfortable here, and I think that it's due to a combination of a thick skin and a playful, lighthearted and helpful attitude, with just a pinch of mischief. They have served me well so far ...

Mike B.
I like the sound of it, Mike, thank you for your comment. Sorry for the relative delay replying to your message... had been taking care of other things since. Cheers.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 8:27 pm 
Offline

Joined: Wed Mar 16, 2016 8:19 pm
Posts: 38
sark02 wrote:
Isn't it interesting that after 41 or so years, people still find fun in implementing MUL, and finding ways to optimize. Similar, I think, to the posts we see from time to time on a new Forth NEXT primitive, or a new sort function, or in the other thread, DEX2HEX. The 6502 is simple and constrained, yet still possesses lesser known behaviors in the dark corners, e.g. around decimal mode, that keep people engaged. It's great!

Happy New Year everyone. Keep hacking.
Absolutely agree. Happy New Year to you too, thank you.
At least for me, 'fun' is a great part of it. Otherwise, why "waste" time doing something which has been done already over and over again, even with better results than we might hope to achieve? 'Fun', exploring, trying out new possibilities, I think those are great 'ingredients' and motivators for many people. Who knows what kind of interesting thing we may discover particularly on a 6502?

Thank you, Cheers!


Last edited by jsii on Sun Dec 31, 2017 8:10 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 8:59 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8382
Location: Midwestern USA
I've been following this discussion and have to say it's been interesting.

That said, if you really want to see an improvement in performance as well as a reduction in code size, try writing the multiply algorithm for the 65C816 running in 16 bit native mode. :D You will be pleasantly amazed at what is gained by processing the operands a word at a time.

Code:
;imul: FACA = FACA × FACB  (64 bit integer multiplication)
;
;   —————————————————————————————————————————————————
;   Preparatory Ops : FACA: 32 bit multiplicand
;                     FACB: 32 bit multplier
;
;   Computed Returns: FACA: 64 bit product
;                     FACB: entry value
;
;   Register Usage  : .A: used
;                     .B: used
;                     .X: used
;                     .Y: truncated to 8 bits
;
;   MPU Flags: NVmxDIZC
;              ||||||||
;              |||||||+———> 0
;              ||||||+————> undefined
;              |||||+—————> entry value
;              ||+++——————> 0
;              ++—————————> undefined
;
;   Notes: 1) Product is undefined if either operand
;             is greater than $FFFFFFFF.
;          2) A call to CLRFACA must be made before
;             FACA is loaded with the multiplicand.
;   —————————————————————————————————————————————————
;
imul     longa                 ;16 bit accumulator
         shortx                ;8 bit index
         cld                   ;ensure binary mode
         ldx #s_bdword+1       ;bits to process +1
         clc
;
.0000010 ror faca+s_long+s_word;rotate...
         ror faca+s_long       ;out...
         ror faca+s_word       ;a...
         ror faca              ;bit
         bcc .0000020          ;0, skip ahead
;
         clc
         lda faca+s_long       ;partial product
         adc facb              ;multiplier
         sta faca+s_long       ;new partial product
         lda faca+s_long+s_word;ditto
         adc facb+s_word       ;ditto
         sta faca+s_long+s_word;ditto
;
.0000020 dex
         bne .0000010          ;next bit
;
         longx                 ;16 bit index
         rts

In the above, symbols such as S_WORD and S_LONG define data type sizes in bytes. S_BDWORD defines the number of bits in a double word (DWORD).

Code:
;idiv: FACA = FACA ÷ FACB  (64 bit integer division)
;
;   —————————————————————————————————————————————————
;   Preparatory Ops : FACA: 64 bit dividend
;                     FACB: 64 bit divisor
;
;   Computed Returns: FACA: 64 bit quotient
;                     FACB: entry value
;                     FACC: used
;
;   Register Returns: .A: used
;                     .B: used
;                     .X: remainder LSW
;                     .Y: remainder MSW
;
;   MPU Flags: NVmxDIZC
;              ||||||||
;              |||||||+———> 0: quotient valid
;              |||||||      1: division by zero
;              ||||||+————> 0: remainder != zero
;              ||||||       1: remainder == zero
;              |||||+—————> entry value
;              ||+++——————> 0
;              ++—————————> undefined
;
;   NOTES: 1) All values are in little-endian format.
;          2) The remainder will also be in _OVRFLO_.
;   —————————————————————————————————————————————————
;
idiv     longa                 ;16 bit accumulator
         shortx                ;8 bit index
         cld                   ;ensure binary mode
         jsr clrfacc           ;clear tertiary accumulator
         jsr clrovrfl          ;clear overflow
         ldx #s_dlong-s_word
;
;
;   perform divide-by-zero check...
;
.0000010 lda facb,x
         bne idiv01
;
         .rept s_word          ;decrement .X twice
           dex
         .endr
         bpl .0000010
;
         tax
         tay
         longx
         sec                   ;division by zero error
         rts
;
idiv01   ldy #s_blword
         clc
;
.0000010 phy
         ldx #0
         ldy #_loopct_
;
.0000020 rol faca,x
         .rept s_word
           inx
         .endr
         dey
         bne .0000020
;
         ldx #0
         ldy #_loopct_
;
.0000030 rol _ovrflo_,x
         .rept s_word
           inx
         .endr
         dey
         bpl .0000030
;
         ldx #0
         ldy #_loopct_
         sec
;
.0000040 lda _ovrflo_,x
         sbc facb,x
         sta facc,x
         .rept s_word
             inx
         .endr
         dey
         bne .0000040
;
         bcc .0000060
;         
         ldx #_loopct_
;
.0000050 lda facc,x
         sta _ovrflo_,x
         .rept s_word
           dex
         .endr
         bpl .0000050
;
.0000060 ply
         dey
         bne .0000010
;
         tyx
         ldy #_loopct_
;
.0000070 rol faca,x
         .rept s_word
           inx
         .endr
         dey
         bne .0000070
;
         longx
         ldx _ovrflo_
         ldy _ovrflo_+s_word
         txa
         ora _ovrflo_+s_word
         clc
         rts

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 9:07 pm 
Offline

Joined: Sat May 02, 2015 6:59 pm
Posts: 134
jsii, just a suggestion for your 6502 toolbox if you haven't found it yet, and since reference to cycle counts are popping up in the thread.

Kowalski's 6502 Simulator.
It has a built in assembler and lots of debugging features, including cycle counting.

You'll find a version with recent bug fixes (thanks Daryl) here:
http://forum.6502.org/viewtopic.php?f=8&t=5011&p=57786


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 9:12 pm 
Offline

Joined: Wed Mar 16, 2016 8:19 pm
Posts: 38
Dr Jefyll wrote:
sark02 wrote:
Isn't it interesting that after 41 or so years, people still find fun in implementing MUL, and finding ways to optimize.
Yup! :) And sometimes the "optimization" involves upgrading the code to eliminate incorrect results! :shock: :P If anyone's interested, see Garth's thread (below).

UM* (multiplication) bug in common 6502 Forths



Later in the thread Bruce aka dclxvi posts a speedup, and FWIW after that I posted an animation and a comparison of several versions including a speedup to the speedup. (Apologies for going slightly OT -- these are 32 bit multiplies, but not especially simple.)

Happy New Year, all!
Jeff

Image
Quite true, you never know what kind of good thing you're going to find from correcting something done. Happy New Year to you too, thank you.

Think of the great motivation which had to happen in order to "get" you from simply reading a thread, to use your valuable time and effort in order to make your animation and comparison for everyone else to see. That kind of effort takes some kind of positive "something", a really good disposition to make something good happen. I think that kind of incredible disposition is what true achievement is made of.

In all truth, sometimes I wonder how come greater things haven't occurred yet here from these forums, where I see so many talented bright '6502 minds' come about.

Thank you, Cheers!


Last edited by jsii on Sun Dec 31, 2017 8:10 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 30, 2017 9:37 pm 
Offline

Joined: Wed Mar 16, 2016 8:19 pm
Posts: 38
Cray Ze wrote:
jsii, just a suggestion for your 6502 toolbox if you haven't found it yet, and since reference to cycle counts are popping up in the thread.

Kowalski's 6502 Simulator.
It has a built in assembler and lots of debugging features, including cycle counting.

You'll find a version with recent bug fixes (thanks Daryl) here:
http://forum.6502.org/viewtopic.php?f=8&t=5011&p=57786
Remarkable, I'll keep it well in mind, thank you. I'm kind of an "old style" cycle-counter-by-hand myself (I've always liked it that way), but an extra tool at least for verification purposes can be quite useful. The Kowalski's 6502 simulator you mention may be "a bit of an improvement" for me now (to say the least) over the windows notepad.exe I'm actually using for programming, as 'ridiculous' as it may sound.

Cheers and a Happy New Year!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 39 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron