6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Sep 29, 2024 1:23 pm

All times are UTC




Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Wed Oct 10, 2018 3:57 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Well I've been doing embedded systems development of one kind or another for 35+ years, and this is the first division routine that I've ever actually written. I've recently implemented a Goldschmidt square root and inverse square function, and used the square of the inverse square root to perform division by multiplication with the inverse of the divisor. Since the Goldschmidt algorithm uses multipliers to iteratively arrive at its results, I don't really consider that an implementation of a division algorithm per se.

Therefore, the following source contains my first SW division routine. I've written it to be part of the run-time library of my Pascal compiler, so it finds its operands on the system stack. The stack-relative addressing mode and the register stack has been put to use in constructing the routine. The return address is at offset 1, the divisor is at offset 3, and the dividend is at offset 5. When finished, the ATOS register holds the quotient, and the ANOS register holds the remainder. At initialization, ATOS holds the dividend, and ANOS is initialized to 0. A non-restoring algorithm is used for the implementation. The file has been assembled with the PyAsm65 assembler, and I loaded and tested the routine in Py65.

Code:
(   1)                  ; ;   unsigned division 16 x 16
(   2)                  ;         .cod 512
(   3)                  ; _idiv   .proc
(   4) 0200 A900        ;         lda #0
(   5) 0202 0B          ;         dup a
(   6) 0203 CBB505      ;         lda.w 5,S
(   7) 0206 A010        ;         ldy #16
(   8)                  ; _idiv_Lp   
(   9) 0208 18          ;         clc
(  10) 0209 AB0A        ;         asl.w a
(  11) 020B 1B          ;         swp a
(  12) 020C AB2A        ;         rol.w a
(  13) 020E B006        ;         bcs _idiv_Plus
(  14)                  ; _idiv_Minus
(  15) 0210 38          ;         sec
(  16) 0211 CBF503      ;         sbc.w 3,S
(  17) 0214 8004        ;         bra _idiv_Next
(  18)                  ; _idiv_Plus
(  19) 0216 18          ;         clc
(  20) 0217 CB7503      ;         adc.w 3,S
(  21)                  ; _idiv_Next   
(  22) 021A 1B          ;         swp a
(  23) 021B 3002        ;         bmi _idiv_Dec
(  24) 021D AB1A        ;         inc.w a
(  25)                  ; _idiv_Dec
(  26) 021F 88          ;         dey
(  27) 0220 D0E6        ;         bne _idiv_Lp
(  28)                  ; ;
(  29)                  ; _idiv_Exit
(  30) 0222 1B          ;         swp a
(  31) 0223 AB090000    ;         ora.w #0
(  32) 0227 1004        ;         bpl _idiv_Finish
(  33) 0229 18          ;         clc
(  34) 022A CB7503      ;         adc.w 3,S
(  35)                  ; _idiv_Finish
(  36) 022D 1B          ;         swp a
(  37) 022E 60          ;         rts
(  38)                  ;         .endp
(  39)                  ;         .end


For 32767 / 10 = 3276 rem 7, the routine requires 378 cycles. For 32767 / 0 = 65535 rem 32767, the routine requires 410 cycles. For 1 / 1 = 1 rem 0, the routine require 350 cycles. I've not exhaustively tested the routine, but it looks like the execution time ranges from 410 to 350 cycles. If the architecture supported register to register addition/subtraction, some more cycles could be saved. As it is, with the quotient and remainder maintained in the A register stack, a considerable number of memory cycles were saved. For the routine, the average CPI is 1.88 cycles, and the average instruction length is 1.77 bytes. I found these results interesting given that the stack-relative addition and subtraction instructions are 3 bytes long and require an additional 2 cycles to read the divisor from the stack: a total of 5 cycles.

Would like to gauge my implementation with my M65C02A architecture against a similar division routine implemented on the 65816. Since the '816 has a more 16-bit like architecture and does not rely on prefix codes like the M65C02A architecture, I suspect that there will be some advantage to the '816 over the M65C02A. Maybe the register stack that is part of the M65C02A A, X, and Y registers may offer some advantage to close the performance gap.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 29, 2018 8:07 pm 
Offline

Joined: Tue Nov 27, 2018 7:32 pm
Posts: 4
Nice work! Implementing something like this is non-trivial. You have my respect. Liddicoat and Flynn would be proud.

I don't recognize all of the opcodes as 65C02, like "swp a" and "dup a". The 65816 had an XBA, which swapped the byte order of the accumulator. Where is the 65C02A documentation


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 29, 2018 8:50 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Welcome, ghedger42. There's an entire thread on the M65C02A, and I think the link in this post will lead you to the information you seek. :)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: