6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Nov 12, 2024 12:12 am

All times are UTC




Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 11  Next
Author Message
PostPosted: Sat Oct 10, 2020 7:48 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
In this thread, viewtopic.php?f=4&t=6315 hatsugai reported some outstanding numbers for executing recursive, compiled code with his special hardware featuring sliding stack windows and a custom compiler.

I just had to see how well hand-written code compares to his results.

Just the thought of writing a 6502 version makes my head hurt, so I'll leave it as an exercise for the superstar programmers here to do it...

As expected, the 6809 does very well:

Code:
 0100                             00021          org    $100
                                  00022
 0100                             00023 fib
 0100 1083 0002               [5] 00024          cmpd   #2
 0104 25 16 (011C)            [3] 00025          blo    Return
                                  00026
 0106 34 06                   [7] 00027          pshs   D         ; Save original n
                                  00028
 0108 C3 FFFF                 [4] 00029          addd   #-1       ; Do fib(n - 1)
 010B BD 0100                 [8] 00030          jsr    fib
                                  00031
 010E 34 06                   [7] 00032          pshs   D         ; Save fib(n - 1)
                                  00033
 0110 EC 62                   [6] 00034          ldd    2,S       ; Retrieve original n
                                  00035
 0112 C3 FFFE                 [4] 00036          addd   #-2       ; Do fib(n - 2)
 0115 BD 0100                 [8] 00037          jsr    fib
                                  00038
 0118 E3 E4                   [6] 00039          addd   ,S        ; Add them together
                                  00040
 011A 32 64                   [5] 00041          leas   4,S       ; Clean the stack
                                  00042
 011C                             00043 Return
 011C 39                      [5] 00044          rts
                                  00045
 011D                             00046 Test
 011D CC 0017                 [3] 00047          ldd    #23
 0120 BD 0100                 [8] 00048          jsr    fib
                                  00049
 011D                             00050          end    Test


3,755,751 cycles




The 8080 does not fair too poorly. This is one of the few times the XTHL instruction (swap HL contents with the word at the top of the stack) is useful.

Code:
 0100                     00021         org     100h
                          00022
 0100                     00023 fib:
 0100 7C              [5] 00024         mov     A,H
 0101 FE 00           [7] 00025         cpi     0
 0103 C2 010A        [10] 00026         jnz     2f
                          00027
 0106 7D              [5] 00028         mov     A,L
 0107 FE 02           [7] 00029         cpi     2
 0109 D8           [5/11] 00030         rc
                          00031
 010A                     00032 2:
 010A E5             [11] 00033         push    H               ; Save original n
                          00034
 010B 11 FFFF        [10] 00035         lxi     D,-1            ; Do fib(n - 1)
 010E 19             [10] 00036         dad     D
 010F CD 0100        [17] 00037         call    fib
                          00038
 0112 E3             [18] 00039         xthl                    ; Save fib(n - 1) and retrieve original n
                          00040
 0113 11 FFFE        [10] 00041         lxi     D,-2            ; Do fib(n - 2)
 0116 19             [10] 00042         dad     D
 0117 CD 0100        [17] 00043         call    fib
                          00044
 011A D1             [10] 00045         pop     D               ; Add them together
 011B 19             [10] 00046         dad     D
                          00047
 011C C9             [10] 00048         ret
                          00049
 011D                     00050 Test:
 011D 21 0017        [10] 00051         lxi     H,23
 0120 CD 0100        [17] 00052         call    fib
                          00053
 011D                     00054         end     Test


10,061,711 cycles



I had to see just how "inferior" the 6800 is...somewhat similar to the 8080:

Code:
 0100                     00021          org    $100
                          00022
 0100                     00023 fib
 0100 4D              [2] 00024          tsta
 0101 26 04 (0107)    [4] 00025          bne    2f
                          00026
 0103 C1 02           [2] 00027          cmpb   #2
 0105 25 20 (0127)    [4] 00028          blo    Return
                          00029
 0107                     00030 2
 0107 37              [4] 00031          pshb             ; Save original n
 0108 36              [4] 00032          psha
                          00033
 0109 C0 01           [2] 00034          subb   #1        ; Do fib(n - 1)
 010B 82 00           [2] 00035          sbca   #0
 010D BD 0100         [9] 00036          jsr    fib
                          00037
 0110 37              [4] 00038          pshb             ; Save fib(n - 1)
 0111 36              [4] 00039          psha
                          00040
 0112 30              [4] 00041          tsx              ; Address the stack
                          00042
 0113 E6 03           [5] 00043          ldab   3,X       ; Retrieve original n
 0115 A6 02           [5] 00044          ldaa   2,X
                          00045
 0117 C0 02           [2] 00046          subb   #2        ; Do fib(n - 2)
 0119 82 00           [2] 00047          sbca   #0
 011B BD 0100         [9] 00048          jsr    fib
                          00049
 011E 30              [4] 00050          tsx              ; Address the stack
                          00051
 011F EB 01           [5] 00052          addb   1,X       ; Add them together
 0121 A9 00           [5] 00053          adca   ,X
                          00054
 0123 31              [4] 00055          ins              ; Clean the stack
 0124 31              [4] 00056          ins
 0125 31              [4] 00057          ins
 0126 31              [4] 00058          ins
                          00059
 0127                     00060 Return
 0127 39              [5] 00061          rts
                          00062
 0128                     00063 Test
 0128 4F              [2] 00064          clra
 0129 C6 17           [2] 00065          ldab   #23
 012B BD 0100         [9] 00066          jsr    fib
                          00067
 0128                     00068          end    Test


5,564,070 cycles


Sometime in the future, I shall revisit this to see how well my compilers do.


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 10, 2020 8:08 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1485
Location: Scotland
BillG wrote:
In this thread, viewtopic.php?f=4&t=6315 hatsugai reported some outstanding numbers for executing recursive, compiled code with his special hardware featuring sliding stack windows and a custom compiler.

I just had to see how well hand-written code compares to his results.

Just the thought of writing a 6502 version makes my head hurt, so I'll leave it as an exercise for the superstar programmers here to do it...


I 'cheated' and did it in BCPL on my Ruby 816 running at 16Mhz:

Code:
GET "libhdr.h"
GET "sys.h"

LET fib (n) = VALOF
{
  IF n < 2 RESULTIS n

  RESULTIS fib (n - 1) + fib (n - 2)
}

AND start () = VALOF
{
  LET tStart, tEnd = ?,?

  tStart := sys (Sys_cputime)
  writef ("%d *n", fib (23))
  tEnd   := sys (Sys_cputime)

  writef ("Time: %d *n", tEnd - tStart)

  RESULTIS 0
}


Run:

Code:
% fib
28657
Time: 3734


That's 3.7 seconds.

Which I have to say I'm happy enough with (and variables are 32-bits too)

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 10, 2020 8:47 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Before tackling the unextended 6502 directly, I'd just like to note that the 65816 is well-suited to this sort of code. Not only does it have a zero-page offset register (which could easily be used to implement zero-page windows), but it also has a 16-bit stack pointer and stack-relative addressing for ALU ops. You can implement recursive fib() without touching the index registers or zero page at all.
Code:
fib:  ; Enter with A in 16-bit mode, argument in A; returns answer in A
  CMP ##2
  BMI :+   ; if(arg<2) return arg;
  DEC A
  PHA
  DEC A
  JSR fib    ; fib(arg-2)
  PHA
  LDA 3,S
  JSR fib    ; fib(arg-1)
  CLC
  ADC 1,S
  STA 3,S    ; fib(arg-2) + fib(arg-1)
  PLA
  PLA        ; stack restored and result in A
: RTS
There's probably a way to optimise that further. Anyone want to test how fast it is?

Of course, the bigger optimisation would be to convert it into an iterative function that starts from {0,1} and merely extends the sequence in linear time.


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 10, 2020 9:58 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
An alternative calling convention has the function accumulate the result into a variable on the the top of stack, preserving A:
Code:
fib:  ; Enter with A & X in 16-bit mode, argument in A; returns by adding to the top of stack, A is preserved, X clobbered
  CMP ##2
  BPL :+    ; if(arg<2) return arg;
  TAX
  CLC
  ADC 3,S
  STA 3,S
  TXA
  RTS
: PEA 0
  DEC A
  JSR fib    ; 0 + fib(arg-1)
  DEC A
  JSR fib    ; … + fib(arg-2)
  INC A
  INC A
  TAX
  CLC
  PLA        ; stack is now restored
  ADC 3,S
  STA 3,S
  TXA
  RTS
The root call should be preceded by PEA 0 to insert the blank accumulator into the stack.


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 10, 2020 10:57 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Now, the obvious workaround for the lack of stack addressing on the 6502 is to use X as a substitute, loading it with the TSX instruction. But we can also use this to implement a "user stack" in some other page of memory, even zero page for an approximation of the "windowed" scheme, leaving the system stack for return addresses. On the 6502 we also don't have the luxury of a 16-bit accumulator, but we can at least assume that the argument is 8-bit if the answer will fit in 16.
Code:
fib:  ; Enter with argument in A, X pointing to the beginning of a clear area in ZP.  Returns with A,X preserved, X pointing to result in ZP.
  CMP #2
  BPL :+
  STA 0,X
  STZ 1,X
  RTS
: PHA
  DEC A
  JSR fib
  INX
  INX
  DEC A
  JSR fib
  DEX
  DEX
  CLC
  LDA 2,X
  ADC 0,X
  STA 0,X
  LDA 3,X
  ADC 1,X
  STA 1,X
  PLA
  RTS
In the above, I've assumed a 65C02. It shouldn't be too difficult to adapt to an NMOS version if required.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 3:18 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Chromatix wrote:
In the above, I've assumed a 65C02. It shouldn't be too difficult to adapt to an NMOS version if required.

That would be most in line with the original spirit of this thread. I haven't had a chance to see if your samples produce the anticipated results, but I must congratulate you and BillG for some very tidy-looking creations!

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 3:38 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Okay, for an NMOS version we need to substitute the STZ and DEC A instructions. STZ is easily done by LDA #0 : STA, assuming A can be clobbered. To permit that and also to work around DEC A efficiently, we pass the argument in the Y register and use A only for arithmetic.
Code:
fib:  ; Enter with argument in Y, X pointing to the beginning of a clear area in ZP.  Returns with Y,X preserved, X pointing to result in ZP, A clobbered.
  CPY #2
  BPL :+
  STY 0,X
  LDA #0
  STA 1,X
  RTS
: DEY
  JSR fib
  INX
  INX
  DEY
  JSR fib
  DEX
  DEX
  CLC
  LDA 2,X
  ADC 0,X
  STA 0,X
  LDA 3,X
  ADC 1,X
  STA 1,X
  INY
  INY
  RTS
This has the side benefit of reducing system stack usage to 2 bytes per recursion level, rather than 3 - but there is slightly more code and it will be a little slower.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 4:04 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Thank you for your efforts, Chromatix.
BillG wrote:
The 8080 does not fair too poorly. This is one of the few times the XTHL instruction (swap HL contents with the word at the top of the stack) is useful.
Bill Gates utilized that instruction to great benefit for his 4K BASIC. It's a neat "diamond in the rough" from my outsider's perspective.
Quote:
Sometime in the future, I shall revisit this to see how well my compilers do.
If they can get to within a binary order of magnitude for size and speed, I would call that an impressive win.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 5:14 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
drogon wrote:
I 'cheated' and did it in BCPL on my Ruby 816 running at 16Mhz:

That's 3.7 seconds.

Which I have to say I'm happy enough with (and variables are 32-bits too)


That compiles to interpreted bytecode, right?


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 5:16 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Chromatix wrote:
In the above, I've assumed a 65C02. It shouldn't be too difficult to adapt to an NMOS version if required.


Thanks for all of the good work.

When I get a chance, I will run the 6502 timing. My tools are currently NMOS version only.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 5:21 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
barrym95838 wrote:
BillG wrote:
The 8080 does not fair too poorly. This is one of the few times the XTHL instruction (swap HL contents with the word at the top of the stack) is useful.
Bill Gates utilized that instruction to great benefit for his 4K BASIC. It's a neat "diamond in the rough" from my outsider's perspective.


I find that XCHG (swap contents of the DE and HL register pairs) is the other magical instruction at only 4 cycles. I wish there was an equivalent to swap BC with HL.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 7:39 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1485
Location: Scotland
BillG wrote:
drogon wrote:
I 'cheated' and did it in BCPL on my Ruby 816 running at 16Mhz:

That's 3.7 seconds.

Which I have to say I'm happy enough with (and variables are 32-bits too)


That compiles to interpreted bytecode, right?


Yes. (124 bytes in this instance) I wrote the bytecode interpreter in '816 assembler. It's somewhat slow in that it's running a 32-bit virtual machine (although that's no worse than handling 16-bit data on the 65C02, the 8-bit data bus not withstanding) and I have extra code to make the 64K banks of RAM transparent to the code, so allocating and iterating over arrays > 64KB "just works" (at the expense of speed) I edited and compiled the code directly on my Ruby board.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 11, 2020 1:23 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Chromatix wrote:
Okay, for an NMOS version we need to substitute the STZ and DEC A instructions. STZ is easily done by LDA #0 : STA, assuming A can be clobbered. To permit that and also to work around DEC A efficiently, we pass the argument in the Y register and use A only for arithmetic.
Code:
fib:  ; Enter with argument in Y, X pointing to the beginning of a clear area in ZP.  Returns with Y,X preserved, X pointing to result in ZP, A clobbered.


Very nicely done!

Code:
 0200                     00023          org    $200
                          00024
 0200                     00025 fib:
 0200 C0 02           [2] 00026          cpy    #2
 0202 10 07 (020B)  [2/3] 00027          bpl    2f
                          00028
 0204 94 00           [4] 00029          sty    0,X
 0206 A9 00           [2] 00030          lda    #0
 0208 95 01           [4] 00031          sta    1,X
 020A 60              [6] 00032          rts
                          00033
 020B                     00034 2:
 020B 88              [2] 00035          dey
 020C 20 0200         [6] 00036          jsr    fib
 020F E8              [2] 00037          inx
 0210 E8              [2] 00038          inx
 0211 88              [2] 00039          dey
 0212 20 0200         [6] 00040          jsr    fib
 0215 CA              [2] 00041          dex
 0216 CA              [2] 00042          dex
 0217 18              [2] 00043          clc
 0218 B5 02           [4] 00044          lda    2,X
 021A 75 00           [4] 00045          adc    0,X
 021C 95 00           [4] 00046          sta    0,X
 021E B5 03           [4] 00047          lda    3,X
 0220 75 01           [4] 00048          adc    1,X
 0222 95 01           [4] 00049          sta    1,X
 0224 C8              [2] 00050          iny
 0225 C8              [2] 00051          iny
 0226 60              [6] 00052          rts
                          00053
 0227                     00054 Test:
 0227 A0 17           [2] 00055          ldy    #23
 0229 A2 00           [2] 00056          ldx    #0
 022B 20 0200         [6] 00057          jsr    fib
                          00058
 0227                     00059          end    Test


3,941,225 cycles


With the adoption of a couple of your techniques, n is always less than 256 and keeping n in a register along with extensive use of the depessimizer between the ears, the code for the other processors got better.

The 6809 saw some improvement, but it is clearly in diminishing returns territory:

Code:
 0100                             00023          org    $100
                                  00024
 0100                             00025 fib
 0100 8C 0002                 [4] 00026          cmpx   #2        ; Is n < 2?
 0103 24 03 (0108)            [3] 00027          bhs    2f        ; No
                                  00028
 0105 1F 10                   [6] 00029          tfr    X,D       ; Return n
                                  00030
 0107 39                      [5] 00031          rts
                                  00032
 0108                             00033 2
 0108 30 1F                   [5] 00034          dex              ; Do fib(n - 1)
 010A 8D F4 (0100)            [7] 00035          bsr    fib
                                  00036
 010C 34 06                   [7] 00037          pshs   D         ; Save fib(n - 1)
                                  00038
 010E 30 1F                   [5] 00039          dex              ; Do fib(n - 2)
 0110 8D EE (0100)            [7] 00040          bsr    fib
                                  00041
 0112 E3 E1                   [9] 00042          addd   ,S++      ; Add them together
                                  00043
 0114 30 02                   [5] 00044          leax   2,X       ; Restore original n
                                  00045
 0116 39                      [5] 00046          rts
                                  00047
 0117                             00048 Test
 0117 8E 0017                 [3] 00049          ldx    #23
 011A 8D E4 (0100)            [7] 00050          bsr    fib
                                  00051
 0117                             00052          end    Test


3,523,921 cycles



The 8080 showed a substantial gain:

Code:
 0100                     00023         org     100h
                          00024
 0100                     00025 fib:
 0100 FE 02           [7] 00026         cpi     2               ; n < 2?
 0102 D2 0108        [10] 00027         jnc     2f              ; No
                          00028
 0105 6F              [5] 00029         mov     L,A             ; Return n
 0106 60              [5] 00030         mov     H,B
                          00031
 0107 C9             [10] 00032         ret
                          00033
 0108                     00034 2:
 0108 3D              [5] 00035         dcr     A               ; Do fib(n - 1)
 0109 CD 0100        [17] 00036         call    fib
                          00037
 010C E5             [11] 00038         push    H               ; Save fib(n - 1)
                          00039
 010D 3D              [5] 00040         dcr     A               ; Do fib(n - 2)
 010E CD 0100        [17] 00041         call    fib
                          00042
 0111 D1             [10] 00043         pop     D               ; Add them together
 0112 19             [10] 00044         dad     D
                          00045
 0113 81              [4] 00046         add     C               ; Restore original n
                          00047
 0114 C9             [10] 00048         ret
                          00049
 0115                     00050 Test:
 0115 3E 17           [7] 00051         mvi     A,23
 0117 01 0002        [10] 00052         lxi     B,2             ; Keep 0 and 2 handy in spare registers
 011A CD 0100        [17] 00053         call    fib
                          00054
 0115                     00055         end     Test


6,630,552 cycles



The lowly 6800 also saw major improvement, almost catching up to the 6502:

Code:
 0000                     00023 Tmp      rmb    1
                          00024
 0100                     00025          org    $100
                          00026
 0100                     00027 fib
 0100 C1 02           [2] 00028          cmpb   #2        ; Is n < 2
 0102 24 02 (0106)    [4] 00029          bhs    2f        ; No
                          00030
 0104 4F              [2] 00031          clra             ; Return n
                          00032
 0105 39              [5] 00033          rts
                          00034
 0106                     00035 2
 0106 37              [4] 00036          pshb             ; Save original n
                          00037
 0107 5A              [2] 00038          decb             ; Do fib(n - 1)
 0108 8D F6 (0100)    [8] 00039          bsr    fib
                          00040
 010A D7 00           [4] 00041          stab   Tmp       ; Stash low byte of result
                          00042
 010C 33              [4] 00043          pulb             ; Retrieve original n
                          00044
 010D 36              [4] 00045          psha             ; Save fib(n - 1)
 010E 96 00           [3] 00046          ldaa   Tmp
 0110 36              [4] 00047          psha
                          00048
 0111 C0 02           [2] 00049          subb   #2        ; Do fib(n - 2)
 0113 8D EB (0100)    [8] 00050          bsr    fib
                          00051
 0115 30              [4] 00052          tsx              ; Address the stack
                          00053
 0116 EB 00           [5] 00054          addb   ,X        ; Add them together
 0118 A9 01           [5] 00055          adca   1,X
                          00056
 011A 31              [4] 00057          ins              ; Clean the stack
 011B 31              [4] 00058          ins
                          00059
 011C 39              [5] 00060          rts
                          00061
 011D                     00062 Test
 011D C6 17           [2] 00063          ldab   #23
 011F 8D DF (0100)    [8] 00064          bsr    fib
                          00065
 011D                     00066          end    Test


4,126,686 cycles


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 13, 2020 6:45 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
barrym95838 wrote:
Quote:
Sometime in the future, I shall revisit this to see how well my compilers do.
If they can get to within a binary order of magnitude for size and speed, I would call that an impressive win.


If I can beat the other compilers, I will be very happy.

This is what I am guessing is the best a compiler may hope to do with this Pascal program. The result is almost twice the speed quoted for gcc6809.

Code:
                                  00001 *
                                  00002 * program Test;
                                  00003 *
                                  00004 *   function fib(n : byte)
                                  00005 *                  : integer;
                                  00006 *
                                  00007 *     begin
                                  00008 *       if n < 2
                                  00009 *         then
                                  00010 *           fib := n
                                  00011 *         else
                                  00012 *           fib := fib(n - 1) + fib(n - 2)
                                  00013 *
                                  00014 *     end;  { fib }
                                  00015 *
                                  00016 *   begin
                                  00017 *     fib(23)
                                  00018 *   end.  { Test }
                                  00019 *
                                  00020 * 4,914,946 cycles
                                  00021 *
                                  00022
 0100                             00023          org    $100
                                  00024
 0100                             00025 fib
 0100 A6 62                   [5] 00026          lda    2,S       ; Get n
 0102 81 02                   [2] 00027          cmpa   #2        ; Is n < 2?
 0104 24 07 (010D)            [3] 00028          bhs    2f        ; No
                                  00029
 0106 35 20                   [7] 00030          puls   Y         ; Get return address
 0108 35 04                   [6] 00031          puls   B         ; Get n
 010A 4F                      [2] 00032          clra             ; Return in D
                                  00033
 010B 6E A4                   [3] 00034          jmp    ,Y        ; Return
                                  00035
 010D                             00036 2
 010D 4A                      [2] 00037          deca             ; Do fib(n - 1)
 010E 34 02                   [6] 00038          pshs   A
 0110 BD 0100                 [8] 00039          jsr    fib
                                  00040
 0113 34 06                   [7] 00041          pshs   D         ; Save fib(n - 1)
                                  00042
 0115 A6 64                   [5] 00043          lda    4,S       ; Retrieve original n
                                  00044
 0117 80 02                   [2] 00045          suba   #2        ; Do fib(n - 2)
 0119 34 02                   [6] 00046          pshs   A
 011B BD 0100                 [8] 00047          jsr    fib
                                  00048
 011E E3 E1                   [9] 00049          addd   ,S++      ; Add them together
                                  00050
 0120 35 20                   [7] 00051          puls   Y         ; Get return address
                                  00052
 0122 32 61                   [5] 00053          leas   1,S       ; Clean the stack
                                  00054
 0124 6E A4                   [3] 00055          jmp    ,Y        ; Return
                                  00056
 0126                             00057 Test
 0126 86 17                   [2] 00058          lda    #23
 0128 34 02                   [6] 00059          pshs   A
 012A BD 0100                 [8] 00060          jsr    fib
                                  00061
 0126                             00062          end    Test


My feeling at this point is that the code for the 6502, 8080 and 6800 will all be very ugly. The Z80 may not be much better.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 13, 2020 6:52 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
If I had a good peephole optimizer, I may be able to replace the JSR instructions with BSR for a slight improvement.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 163 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 11  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron