Nice! Yup I imagine the zero checks and clc avoidance in mul8x8 aren’t worth it with uniform random inputs. One day if I have the energy I might try the 32x32 version where avoiding the extra multiply might pay off.
Yeah, I would certainly be very interested to see a full 32x32 version.
EDIT: I ...
Search found 37 matches
- Sun Jul 28, 2024 11:53 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
- Sun Jul 28, 2024 5:55 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
Thank you for contributing the implementation pdragon. Very interesting to see a new algorithm being used. I had to convert your 65C02 to plain old 6502 for my test framework, but fortunately I managed to convert without any loss of cycles, and only taking an extra 4 bytes. I switched 2xbra->jmp, an ...
- Tue Jul 23, 2024 11:49 pm
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
Thanks. It should be possible to account for a skew in the probability of input data, as long as that skew is properly defined. I've not attempted that.
I could imagine (a) the usual algorithms running over all possible inputs, but with each result weighted depending on the size of the inputs. OR ...
I could imagine (a) the usual algorithms running over all possible inputs, but with each result weighted depending on the size of the inputs. OR ...
- Mon Feb 26, 2024 8:36 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
repose wrote:
I believe that is the fastest possible 8-bit signed multiply. I would point out a few things though...
I've used Y on input for smult11, a variant that saves two cycles by using another 256 byte lookup table.
- Mon Feb 19, 2024 1:46 pm
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
Great! However, I would suggest
x0 = p_sqr_lo1 ; multiplier, 2 bytes
x1 = p_sqr_lo2
as it seems more logical to me (yes, I did lead you down the wrong path with my mistake).
Done.
I've also done a first pass of a signed 16x16.
Sounds good. I've just included a 8x8 signed multiply that is ...
x0 = p_sqr_lo1 ; multiplier, 2 bytes
x1 = p_sqr_lo2
as it seems more logical to me (yes, I did lead you down the wrong path with my mistake).
Done.
I've also done a first pass of a signed 16x16.
Sounds good. I've just included a 8x8 signed multiply that is ...
- Wed Feb 14, 2024 2:13 pm
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
trick:
x1 = p_sqr_lo
umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi
; set multiplier as x0
; *x1 is no longer preserved
To easily save 3 cycles and 1 byte zp.
I could also accept Y0 in Y, but that can place restrictions on the ...
x1 = p_sqr_lo
umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi
; set multiplier as x0
; *x1 is no longer preserved
To easily save 3 cycles and 1 byte zp.
I could also accept Y0 in Y, but that can place restrictions on the ...
- Tue Feb 13, 2024 9:30 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
I've properly published this now at https://codebase64.org/doku.php?id=base:fastest_multiplication_2023
Note, there's some changes. My post above was missing an important comment (at first?), in any case your version is missing it
; set multiplicand as y0
ldy y0
;x1y0l = low(x1*y0)
;x1y0h ...
Note, there's some changes. My post above was missing an important comment (at first?), in any case your version is missing it
; set multiplicand as y0
ldy y0
;x1y0l = low(x1*y0)
;x1y0h ...
- Mon Feb 12, 2024 12:05 pm
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
Here you go
Very nice Repose! That takes the top spot for the fastest 16 bit multiply. I published your code with cosmetic changes and a couple of minor trivial tweaks: I put the z0-z3 in zero page and replaced "sta z1" with "tay" to return the value in a register instead of memory.
Published as ...
Very nice Repose! That takes the top spot for the fastest 16 bit multiply. I published your code with cosmetic changes and a couple of minor trivial tweaks: I put the z0-z3 in zero page and replaced "sta z1" with "tay" to return the value in a register instead of memory.
Published as ...
- Sun Jan 28, 2024 7:56 am
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
Thanks for the update! I'm a bit worried though, as I did label my routine as updated in 2023, which is true, but only for clarity. My actual 188.1 routine is still not published, so I'm hoping you didn't think my existing 2023 update was the fast one, though I'm glad a more clear version was ...
- Sun Jan 07, 2024 2:06 pm
- Forum: Programming
- Topic: Comparing 6502 multiply routines
- Replies: 51
- Views: 26461
Re: Comparing 6502 multiply routines
I've updated the results with repose's improved code.
I do have a suggestion though, I'd like to see the results sorted by time, size ascending, and highlight the optimal ones as yellow like you do in the graphs.
Sadly github's markdown doesn't allow for colour, nor dynamically sorting tables by ...
I do have a suggestion though, I'd like to see the results sorted by time, size ascending, and highlight the optimal ones as yellow like you do in the graphs.
Sadly github's markdown doesn't allow for colour, nor dynamically sorting tables by ...
- Thu Oct 12, 2023 7:19 am
- Forum: Programming
- Topic: Mini-challenge: Sort three bytes
- Replies: 22
- Views: 12349
Re: Mini-challenge: Sort three bytes
dmsc wrote:
Hi!
And here, 6 bytes less:
And here, 6 bytes less:
- Wed Oct 11, 2023 5:37 pm
- Forum: Programming
- Topic: Mini-challenge: Sort three bytes
- Replies: 22
- Views: 12349
Re: Mini-challenge: Sort three bytes
Cool. I did wonder whether starting with loading z1 might be more efficient somehow. In case it needs moving, we learn that quickly, and otherwise we are well placed to immediately compare it with the other candidate. I'm not sure though it would help or hinder overall but the flow at least would ...
- Wed Oct 11, 2023 3:51 pm
- Forum: Programming
- Topic: Mini-challenge: Sort three bytes
- Replies: 22
- Views: 12349
Re: Mini-challenge: Sort three bytes
Edit: And here's a first pass speed-optimised version - just writing out all the comparisons and swaps that are required, without attempting to share any code between branches.
A slight improvement to gfoot's speedy version: I managed to shave 3 bytes off with the same cycle count as before just ...
- Tue Oct 10, 2023 8:01 pm
- Forum: Programming
- Topic: Mini-challenge: Sort three bytes
- Replies: 22
- Views: 12349
Re: Mini-challenge: Sort three bytes
This thread totally nerd-sniped me. I decided to go for time efficiency and implement a solution similar to Chromatix's idea of a decision-based algorithm that picks one of 8 cases (2 of which are invalid).
I was tempted to do some self-modifying code magic ( opcodes of "STY/STA/STX zp" and "LDY ...
I was tempted to do some self-modifying code magic ( opcodes of "STY/STA/STX zp" and "LDY ...
- Mon Oct 09, 2023 7:12 pm
- Forum: Programming
- Topic: Mini-challenge: Sort three bytes
- Replies: 22
- Views: 12349
Re: Mini-challenge: Sort three bytes
Ooh I didn't think it could get that small. Very nice.