Search found 37 matches

by TobyLobster
Sun Jul 28, 2024 11:53 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

Nice! Yup I imagine the zero checks and clc avoidance in mul8x8 aren’t worth it with uniform random inputs. One day if I have the energy I might try the 32x32 version where avoiding the extra multiply might pay off.

Yeah, I would certainly be very interested to see a full 32x32 version.
EDIT: I ...
by TobyLobster
Sun Jul 28, 2024 5:55 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

Thank you for contributing the implementation pdragon. Very interesting to see a new algorithm being used. I had to convert your 65C02 to plain old 6502 for my test framework, but fortunately I managed to convert without any loss of cycles, and only taking an extra 4 bytes. I switched 2xbra->jmp, an ...
by TobyLobster
Tue Jul 23, 2024 11:49 pm
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

Thanks. It should be possible to account for a skew in the probability of input data, as long as that skew is properly defined. I've not attempted that.

I could imagine (a) the usual algorithms running over all possible inputs, but with each result weighted depending on the size of the inputs. OR ...
by TobyLobster
Mon Feb 26, 2024 8:36 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

repose wrote:
I believe that is the fastest possible 8-bit signed multiply. I would point out a few things though...
I have amended to show the unused bytes.
I've used Y on input for smult11, a variant that saves two cycles by using another 256 byte lookup table.
by TobyLobster
Mon Feb 19, 2024 1:46 pm
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

Great! However, I would suggest
x0 = p_sqr_lo1 ; multiplier, 2 bytes
x1 = p_sqr_lo2
as it seems more logical to me (yes, I did lead you down the wrong path with my mistake).

Done.


I've also done a first pass of a signed 16x16.

Sounds good. I've just included a 8x8 signed multiply that is ...
by TobyLobster
Wed Feb 14, 2024 2:13 pm
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

trick:

x1 = p_sqr_lo

umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi

; set multiplier as x0
; *x1 is no longer preserved

To easily save 3 cycles and 1 byte zp.

I could also accept Y0 in Y, but that can place restrictions on the ...
by TobyLobster
Tue Feb 13, 2024 9:30 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

I've properly published this now at https://codebase64.org/doku.php?id=base:fastest_multiplication_2023

Note, there's some changes. My post above was missing an important comment (at first?), in any case your version is missing it
; set multiplicand as y0
ldy y0

;x1y0l = low(x1*y0)
;x1y0h ...
by TobyLobster
Mon Feb 12, 2024 12:05 pm
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

Here you go

Very nice Repose! That takes the top spot for the fastest 16 bit multiply. I published your code with cosmetic changes and a couple of minor trivial tweaks: I put the z0-z3 in zero page and replaced "sta z1" with "tay" to return the value in a register instead of memory.

Published as ...
by TobyLobster
Sun Jan 28, 2024 7:56 am
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

Thanks for the update! I'm a bit worried though, as I did label my routine as updated in 2023, which is true, but only for clarity. My actual 188.1 routine is still not published, so I'm hoping you didn't think my existing 2023 update was the fast one, though I'm glad a more clear version was ...
by TobyLobster
Sun Jan 07, 2024 2:06 pm
Forum: Programming
Topic: Comparing 6502 multiply routines
Replies: 51
Views: 26461

Re: Comparing 6502 multiply routines

I've updated the results with repose's improved code.

I do have a suggestion though, I'd like to see the results sorted by time, size ascending, and highlight the optimal ones as yellow like you do in the graphs.
Sadly github's markdown doesn't allow for colour, nor dynamically sorting tables by ...
by TobyLobster
Thu Oct 12, 2023 7:19 am
Forum: Programming
Topic: Mini-challenge: Sort three bytes
Replies: 22
Views: 12349

Re: Mini-challenge: Sort three bytes

dmsc wrote:
Hi!
And here, 6 bytes less:
Excellent! Every time I think "surely that's the limit. Can't get any better than that.", but I always seem to be wrong!
by TobyLobster
Wed Oct 11, 2023 5:37 pm
Forum: Programming
Topic: Mini-challenge: Sort three bytes
Replies: 22
Views: 12349

Re: Mini-challenge: Sort three bytes

Cool. I did wonder whether starting with loading z1 might be more efficient somehow. In case it needs moving, we learn that quickly, and otherwise we are well placed to immediately compare it with the other candidate. I'm not sure though it would help or hinder overall but the flow at least would ...
by TobyLobster
Wed Oct 11, 2023 3:51 pm
Forum: Programming
Topic: Mini-challenge: Sort three bytes
Replies: 22
Views: 12349

Re: Mini-challenge: Sort three bytes


Edit: And here's a first pass speed-optimised version - just writing out all the comparisons and swaps that are required, without attempting to share any code between branches.

A slight improvement to gfoot's speedy version: I managed to shave 3 bytes off with the same cycle count as before just ...
by TobyLobster
Tue Oct 10, 2023 8:01 pm
Forum: Programming
Topic: Mini-challenge: Sort three bytes
Replies: 22
Views: 12349

Re: Mini-challenge: Sort three bytes

This thread totally nerd-sniped me. I decided to go for time efficiency and implement a solution similar to Chromatix's idea of a decision-based algorithm that picks one of 8 cases (2 of which are invalid).
I was tempted to do some self-modifying code magic ( opcodes of "STY/STA/STX zp" and "LDY ...
by TobyLobster
Mon Oct 09, 2023 7:12 pm
Forum: Programming
Topic: Mini-challenge: Sort three bytes
Replies: 22
Views: 12349

Re: Mini-challenge: Sort three bytes

Ooh I didn't think it could get that small. Very nice.