6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Sep 16, 2024 7:34 pm

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Thu Jan 21, 2016 8:02 pm 
Offline

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 274
Location: Placerville, CA
I've been scheming up a project for a while now for a small portable computer (think Tandy Model 100) based around the 65C816 and incorporating 64KB of MRAM I got as a free sample (thank you, Everspin!) Since it's not necessarily going to incorporate any other form of on-board storage (I might work in some flash memory I've got on hand as a kind of notional hard disk, but we'll see - I'm trying to keep it simple,) I'd like to keep code size to a minimum so as to leave more room for whatever data I'm working on. To that end, I've done a 16-bit adaptation of a bytecode interpreter I'm using in another project, which I think should give pretty good code density by comparison to 16-bit '816 assembler (which is already pretty good, but I'd like to squeeze it down even further.) I figure I'll start a thread about it, partly to show off the fruit of my labors and partly to see if anybody can help me improve it. (Full code dump will come a little later, when I've gotten it into a satisfactory state of feature-completeness.)

In particular, at the moment I've just finished the division routine. Overall, I'm pretty happy with it, but it seems a little big to me, and I'm curious if anybody has any thoughts on cutting it down a bit. Here's the code as it stands:
Code:
divide:               ; Common division code - divides #1 by #0 and gives the
   dex               ; quotient in #1 and the remainder in #0 (signed)
   dex               ; Decrement the data stack pointer for a work location
   phy               ; Save the instruction pointer to a temporary location

   stz <$00,x         ; Clear the sign-correct count
   ldy <$00,x         ; Clear the pre-shift count in Y

   lda <$02,x         ; Get the divisor from #0
   beq .z            ; If it's zero, implode
   clc               ; Clear the carry for the upcoming shift
   bpl .pa            ; If it's positive, don't negate it
   eor #$ffff         ; Otherwise, negate it
   inc
   sta <$02,x         ; And store it back
   sec               ; Set the carry for the upcoming shift
.pa   rol <$00,x         ; Update the sign-correct flags

   lda <$04,x         ; Get the dividend from #1
   bpl .pb            ; If it's positive, don't negate it
   eor #$ffff         ; Otherwise, do that
   inc
   sec               ; Set the carry (already cleared) for the upcoming shift
.pb   rol <$00,x         ; Update the sign-correct flags

   stz <$04,x         ; Clear #1 for use as the quotient

.ps   bit <$02,x         ; Check the MSB on the divisor
   bmi .sd            ; If it's 1, don't pre-shift any further
   asl <$02,x         ; Otherwise, shift it left one place
   iny               ; And increment our pre-shift count
   bra .ps            ; And see if we're done with the pre-shift

.sd   ; The divisor is now pre-shifted as far left as it will go - begin!
   ; A contains the working remainder
   cmp <$02,x         ; Is the shifted divisor larger than the remainder?
   bcc .ns            ; If so, don't subtract it from the remainder
   ; Otherwise, do so
   php            ; Save the flag state first      
   sec
   sbc <$02,x
   plp            ; And restore it

.ns   rol <$04,x         ; Whatever we do, shift the carry into the quotient
   lsr <$02,x         ; And shift the divisor right one place
   dey               ; And decrement the pre-shift count
   bpl .sd            ; If it hasn't underflowed, continue

   sta <$02,x         ; Store the remainder into #0
   ply               ; Restore the instruction pointer

   lda <$04,x         ; Load the quotient into A for sign-correction
   lsr <$00,x         ; Check - was the dividend negative?
   bcc .cd            ; If not, skip this bit
   eor #$ffff         ; Otherwise, negate the quotient
   inc
   pha               ; And save it to a temporary location
   lda <$02,x         ; And get the remainder
   eor #$ffff         ; And negate it
   inc
   sta <$02,x         ; And store it back
   pla               ; And retrieve the quotient

.cd   lsr <$00,x         ; Check - was the divisor negative?
   bcc .d            ; If not, skip this bit
   eor #$ffff         ; Otherwise, negate the quotient
   inc
.d   sta <$04,x         ; And store the quotient back

   inx               ; Increment the data stack pointer (drop work location)
   inx
   rts               ; Return to the calling routine

A few words of explanation: as you may be able to tell, the bytecode VM is stack-based, so the operands are addressed as DP,X with X being the stack pointer. I'm trying to do all work on the stack and avoid hard-coded temporary locations, for tidiness. The routine itself is binary long division on signed 16-bit operands, and the routine is careful to do the right thing with the quotient and remainder depending on whether the dividend and/or divisor are negative. (Specifically, that if the dividend is negative then the remainder is negative, so that quotient * divisor + remainder always equals the dividend.)

As far as trying to optimize it further, my priorities are:
  • First, that it's as small as possible while still being decently performant.
  • Second, that it's as performant as possible while still being small.
I want to avoid tables entirely (in order to keep it small,) and I want to preserve the end-result behavior (signed 16/16=16 division giving the quotient and remainder and handling negative operands correctly.) Beyond that, I'm open to input: can anybody improve on this?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 21, 2016 8:50 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Welcome, John. Have you checked out my size-optimized (27-byte) UM/MOD here?

viewtopic.php?f=9&t=1652&start=45#p39595

It is unsigned, with a 32-bit dividend and 16-bit divisor, quotient and remainder for a Forth-style implementation, but it looks like it wouldn't be too difficult to adapt it for your use. I might be able to give it a try for you, but that would be tonight at the absolute earliest.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 21, 2016 9:14 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8508
Location: Southern California
Welcome. Be sure to look at what's in the source code repository at this site, including the 32-bit/16-bit division at http://6502.org/source/integers/ummodfix/ummodfix.htm which uses only 38 instructions for the '816 (versus your 57). It takes in a 32-bit dividend and 16-bit divisor, and produces a 16-bit quotient and 16-bit remainder. That one is unsigned only, and returns FFFF, FFFF if there was overflow or /0, which serves as an error message without interrupting everything if it is acceptable to go with the maximum representable output. It cannot be confused as negative, because it is unsigned only. An unsigned division routine such as this becomes a building block in other, shorter routines that handle the signs and/or other precisions.

See also the division and multiplication topic at viewtopic.php?f=9&t=1652 (I see Mike referred to a post in that topic while I was writing), and the 6502.org wiki at http://6502org.wikidot.com/ .

I might encourage you to enjoy today's low memory prices to get more memory. 512Kx8 SRAMs are about $5 each. (I get them for half that much since I buy in quantity for the 4megabyte memory module I provide.) It can be battery-backed.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 21, 2016 9:36 pm 
Offline

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 274
Location: Placerville, CA
Interesting. I'll have to take a look at some of these.

GARTHWILSON wrote:
I might encourage you to enjoy today's low memory prices to get more memory. 512Kx8 SRAMs are about $5 each. (I get them for half that much since I buy in quantity for the 4megabyte memory module I provide.) It can be battery-backed.

While it is indeed astonishingly cheap to get more memory these days, that's kind of not the point of the project. I'm aiming for small and simple; sticking with 64KB (plus a switchable 2KB boot ROM) means I don't have to worry about the multiplexed A16-A23 and still gives me a fair bit of room to work, plus sticking with the parts I have means I don't have to do battery backup in order to have persistent memory. I might look at doing more memory in a later project, but for now I'm sticking with small.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 22, 2016 8:12 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
I gave a stab at it, but I'm too tired to test it. I'm just gonna throw it out (up?) here, and correct it later if one or more coding errors are discovered:
Code:
divide:                 ; Common division code - divides #1 by #0 and gives the
                        ;   quotient in #1 and the remainder in #0 (signed)
    lda  <0,x           ;
    beq  implode        ; if /0 then implode
    cmp  #$8000         ; save divisor sign in 'C' flag
    bcc  .div2          ;
    jsr  .div6a         ; divisor = |divisor|
.div2:                  ;
    lda  <2,x           ;
    php                 ; save dividend sign in 'N' flag
    bpl  .div3          ;
    jsr  .div9          ; dividend = |dividend|
.div3:                  ;
    lda  <0,x           ; get the divisor from #0
    phy                 ; save the instruction pointer
    ldy  #16            ; loop counter for unsigned divide
.div4:                  ;
    asl  <2,x           ;   quotient gradually replaces the dividend
    rol                 ;   remainder gradually accumulates in A
    bcs  .div5          ;   don't forget the 17th bit!
    cmp  <0,x           ;   if (partial remainder >= divisor) then
    bcc  .div6          ;
.div5:                  ;     update the partial remainder
    sbc  <0,x           ;
    inc  <2,x           ;     and set low bit in quotient
.div6:                  ;
    dey                 ;
    bne  .div4          ; loop until done dividing
    ply                 ; retrieve the instruction pointer
    sta  <0,x           ; store the remainder
    plp                 ; retrieve the divisor sign (C) and dividend sign (N)
    bpl  .div7          ;
.div6a:                 ;
    eor  #$ffff         ; negate the remainder if dividend < 0
    inc                 ;
    sta  <0,x           ;
    bcc  .div8          ;   and negate the quotient if divisor >= 0
    rts                 ; return to the caller
.div7:                  ;
    bcc  .div99         ; done if dividend and divisor were the same sign
.div8:                  ;
    lda  <2,x           ;
.div9:                  ;
    eor  #$ffff         ; negate #1 (dividend or quotient)
    inc                 ;
    sta  <2,x           ;
.div99:                 ;
    rts                 ; return to the caller


Mike B.


Last edited by barrym95838 on Mon Feb 29, 2016 9:08 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 23, 2016 4:09 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
John,

I was trying to reverse-engineer the middle of your routine that does the unsigned divide, because it looks really fast (except for the unnecessary PHP SEC ... PLP instructions), but I can't figure out what magic you're using. Have you tested it thoroughly for correct operation on a wide range of inputs? Did you write it yourself, or base it on previous art? Links and/or details, please?

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 25, 2016 6:35 pm 
Offline

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 274
Location: Placerville, CA
It's mine; as I mentioned, it's basically just binary long division. Start with the divisor in the leftest place it can possibly be, subtract it if possible (and put a 1 in the appropriate place in the quotient,) then shift it down to the next place and repeat. I haven't given any of the code a thorough testing yet (as I mentioned, I'm trying to get to a place of basic feature-completeness before assembly and testing/optimization,) but I did sit down and work through the algorithm on paper (well, in a text editor) and it looks like it should be correct.

You're right about the PHP/PLP; it was meant to preserve the carry across the subtract operation so that it can be shifted into the quotient, but it's simpler just to use a SEC after the subtract since it's already known to be 1 if we're subtracting in the first place; guess I coded a little too defensively for my own good.

(Edit: come think, even the SEC is unnecessary since the carry should remain set, since we won't be subtracting in any situation that will perform a borrow.)

I also saw one of the methods linked in an earlier post doing something starting from the rightmost position and working left instead; I need to try to wrap my head around that because if I'm understanding that correctly then I could do that and dispense with the pre-shift loop altogether...


Last edited by commodorejohn on Mon Jan 25, 2016 9:22 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 25, 2016 7:36 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
(There's a division routine in Bruce's mystery "program for today" - see
viewtopic.php?p=15277#p15277
)
(Not really a mystery if you read the whole thread, but I'll avoid spoiling the puzzle)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: