6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 3:21 pm

All times are UTC




Post new topic Reply to topic  [ 65 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
 Post subject: Re: Announce: Acheron VM
PostPosted: Wed Oct 09, 2019 7:40 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
dmsc wrote:
For the editor this is not easy, as it would mean instrumenting it to do some particular task. But for the sieve benchmark, calculating first 1899 primes, those are the runtime statistics:
Are you using emulator trace logs to measure this? I think you should be able to manually bring the editor to some state, issue commands, and filter the trace logs to measure execution between fenceposts, say something that requires a full-screen refresh or making an insertion that moves a lot of internal data. But yeah, a full batch benchmark is certainly easier to measure, but isn't quite as "real world".

It would also be interesting to see if you would get better sieve performance by hand-writing fastbasic VM operations instead of compiling from its BASIC form. That would be more in line with how Acheron code is currently written, but if it would basically be what BASIC generates anyway, then it might not be worth the bother.

Also, does your cycle tally include the work done during printing? Though I guess with hundreds of millions of cycles executed, that doesn't really affect the percentage much.

Quote:
That would be great. The editor does not have too much hardware dependencies, it uses PRINT to output to the screen, only relies on be able to write control codes (cursor movement, insert line, delete line) and read the current cursor position.

The C64 has control codes for cursor movement, but not insert/delete lines, so a bit of fiddling will be required there.


Here's a first, nontested pass of converting the high-level FastBasic sieve
Code:
? "Starting!"
NumIter = 10
sTime = TIME
' Arrays are initialized to 0
DIM A(8190) Byte
FOR Iter= 1 TO NumIter
  MSET Adr(A), 8190, 0
  Count = 0
  FOR I = 0 TO 8190
    IF NOT A(I)
      Prime = I + I + 3
      FOR K = I + Prime TO 8190 STEP Prime
        A(K) = 1
      NEXT K
      INC Count
    ENDIF
  NEXT I
NEXT Iter

eTime = TIME
? "End."
? "Elapsed time: "; eTime-sTime; " in "; NumIter; " iterations."
? "Found "; Count; " primes."

to low-level Acheron:
Code:
 grow 8
 regnames array, iter, i, k, prime, stime, etime, const1

 ; constants in the assembler
 arrayLoc = $8000 ; using a fixed memory buffer, as opposed to DIM allocation
 numIter = 9  ; 10 iterations
 size = 8190

 with stime
 gettime ; TODO

 with const1
 setp 1

 ; for iter = numIter (down to 0)
 with iter
 setp numIter
iterLoop:

  ; Initialize the array to zero
  with array
  setp arrayLoc
  clrmn size   ; clear mem[rP to rP+size-1]

  ; count = 0
  with count
  clrp

  ; for i=0
  with i
  clrp
loopI:  ; rP = i

   ; prime = membyte[i + array], reusing this var temporarily
   ldmbr prime, array
   bnz nextI
    ; Array entry was zero, this is a prime
    ; prime = 3 + (i<<1)
    setp 3
    addea2 i  ; addea2 = add effective address offset, 2 bytes each

    ; for k = prime+i
    movep k
    add i
loopK:
     ; mem[array + k] = 1
     with const1
     stmbr array, k

     ; step prime
     ; This part is weaker than BASIC, without specific FOR/NEXT instructions
     with k
     add prime
     cmpi16 size  ; need to do a relative test, as we can overshoot the limit
     bnc loopk

    with count
    incp

nextI:
   with i
   incp
   case size, loopI ; can do an equality test, as we're incrementing by 1

 ; next iter
 with iter
 decloop iterLoop

 with etime
 gettime
 sub stime

; ~59 bytes to here?

 printlit
  .byte "End.",13,"Elapsed time: ",0
 with etime
 printdec
 printlit
  .byte " in ",0
 with iter
 setp numIter
 incp
 printdec
 printlit
  .byte " iterations.",13,"Found ",0
 with count
 printdec
 printlit
  .byte " primes.",13,0

 shrink 8


Main section without comments or assemble-time constants, and packed the 'with' codes together, to align more to its actually dispatched opcodes:
Code:
 grow 8
 regnames array, iter, i, k, prime, stime, etime, const1
 gettime_with stime
 setp_with const1, 1
 setp_with iter, 9
iterLoop:
  setp_with array, $8000
  clrmn 8190
  clrp_with count
  clrp_with i
loopI:
   ldmbr prime, array
   bnz nextI
    setp 3
    addea2 i
    movep k
    add i
loopK:
     stmbr_with const1, array, k
     add_with k, prime
     cmpi16 8190
     bnc loopk
    incp_with count
nextI:
  incp_with i
  case 8190, loopI
 decloop_with iter, iterLoop
 gettime_with etime
 sub stime


To my count, it's 59 bytes in 24 instructions for the main non-printing portion. (regnames is a build-time naming macro, with no runtime opcode) This uses some instructions I've designed but haven't written yet, including time & printing. In comparison to your BASIC tokens it's probably fair to add those comparable instructions for this test. There's certainly some things that will be faster or slower in printing & not using DIM, but the main loop should still dominate.

This also uses 16 bytes of zp space for registers. Certainly I could reduce that by taking iter/stime/etime out of regs and into the global page, and using a single temp register for those and const1 instead. That would make the code a bit bigger but friendlier. But if this is the only thing running, and for benchmark purposes, might as well keep it optimal.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Fri Oct 11, 2019 12:55 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138
Hi!

White Flame wrote:
dmsc wrote:
For the editor this is not easy, as it would mean instrumenting it to do some particular task. But for the sieve benchmark, calculating first 1899 primes, those are the runtime statistics:
Are you using emulator trace logs to measure this?
Yes, using a minimal simulator that I wrote (see https://github.com/dmsc/mini65-sim , also used in the FastBasic testsuite). This emulates the full 6502 CPU and traps I/O in the Atari OS to avoid having to simulate the hardware. It also ccan print profiles of executed instructions, trace with labels, detect extra cycles in branches and indirect instructions, etc.
Quote:
I think you should be able to manually bring the editor to some state, issue commands, and filter the trace logs to measure execution between fenceposts, say something that requires a full-screen refresh or making an insertion that moves a lot of internal data. But yeah, a full batch benchmark is certainly easier to measure, but isn't quite as "real world".
Yes, but currently the editor is mainly limited by the speed of the OS output routines, so I have not bothered.
Quote:
It would also be interesting to see if you would get better sieve performance by hand-writing fastbasic VM operations instead of compiling from its BASIC form. That would be more in line with how Acheron code is currently written, but if it would basically be what BASIC generates anyway, then it might not be worth the bother.

Also, does your cycle tally include the work done during printing? Though I guess with hundreds of millions of cycles executed, that doesn't really affect the percentage much.
It includes cycles in the FastBasic PRINT routine, preparing the registers and calling the OS, but does not include the actual OS routines. In the Atari. printing one byte using OS is not as straightforward, because you need to output trough an IOCB (input-output-control-block), same as any other file.
Quote:


Here's a first, nontested pass of converting the high-level FastBasic sieve ... to low-level Acheron,

Main section without comments or assemble-time constants, and packed the 'with' codes together, to align more to its actually dispatched opcodes:
Code:
 grow 8
 regnames array, iter, i, k, prime, stime, etime, const1
 gettime_with stime
 setp_with const1, 1
 setp_with iter, 9
iterLoop:
  setp_with array, $8000
  clrmn 8190
  clrp_with count
  clrp_with i
loopI:
   ldmbr prime, array
   bnz nextI
    setp 3
    addea2 i
    movep k
    add i
loopK:
     stmbr_with const1, array, k
     add_with k, prime
     cmpi16 8190
     bnc loopk
    incp_with count
nextI:
  incp_with i
  case 8190, loopI
 decloop_with iter, iterLoop
 gettime_with etime
 sub stime


To my count, it's 59 bytes in 24 instructions for the main non-printing portion. (regnames is a build-time naming macro, with no runtime opcode) This uses some instructions I've designed but haven't written yet, including time & printing. In comparison to your BASIC tokens it's probably fair to add those comparable instructions for this test. There's certainly some things that will be faster or slower in printing & not using DIM, but the main loop should still dominate.
This is very impressive!
Quote:
This also uses 16 bytes of zp space for registers. Certainly I could reduce that by taking iter/stime/etime out of regs and into the global page, and using a single temp register for those and const1 instead. That would make the code a bit bigger but friendlier. But if this is the only thing running, and for benchmark purposes, might as well keep it optimal.


With my current compiler (that is not as clever as I would like), this is what FastBasic produces:

Code:
; Variables
NUM_VARS = 9
fb_var_NUMITER   = heap_start + 0   ; Word variable
fb_var_STIME   = heap_start + 2   ; Word variable
fb_var_A   = heap_start + 4   ; Byte Array variable
fb_var_ITER   = heap_start + 6   ; Word variable
fb_var_COUNT   = heap_start + 8   ; Word variable
fb_var_I   = heap_start + 10   ; Word variable
fb_var_PRIME   = heap_start + 12   ; Word variable
fb_var_K   = heap_start + 14   ; Word variable
fb_var_ETIME   = heap_start + 16   ; Word variable
;-----------------------------
; Bytecode
bytecode_start:
; LINE 1
  TOK_CSTRING  10, "Starting!", 155
  TOK_PRINT_STR
; LINE 2
  TOK_BYTE  #10
  TOK_VAR_STORE  0
; LINE 3
  TOK_TIME
  TOK_VAR_STORE  1
; LINE 5
  TOK_NUM  #8191
  TOK_DIM  2
; LINE 6
  TOK_VAR_SADDR  3
  TOK_PUSH_1
  TOK_DPOKE
  TOK_VAR_LOAD  0
  TOK_PUSH_1
  TOK_FOR
  TOK_CNJUMP  jump_lbl_1
jump_lbl_2:
; LINE 7
  TOK_VAR_LOAD  2
  TOK_PUSH_NUM  #8190
  TOK_PUSH_0
  TOK_MSET
; LINE 8
  TOK_0
  TOK_VAR_STORE  4
; LINE 9
  TOK_VAR_SADDR  5
  TOK_PUSH_0
  TOK_DPOKE
  TOK_NUM  #8190
  TOK_PUSH_1
  TOK_FOR
  TOK_CNJUMP  jump_lbl_3
jump_lbl_4:
; LINE 10
  TOK_VAR_LOAD  2
  TOK_PUSH_VAR_LOAD  5
  TOK_ADD
  TOK_PEEK
  TOK_COMP_0
  TOK_CNJUMP  jump_lbl_5
; LINE 11
  TOK_VAR_LOAD  5
  TOK_USHL
  TOK_PUSH_BYTE  #3
  TOK_ADD
  TOK_VAR_STORE  6
; LINE 12
  TOK_VAR_SADDR  7
  TOK_PUSH_VAR_LOAD  5
  TOK_PUSH_VAR_LOAD  6
  TOK_ADD
  TOK_DPOKE
  TOK_NUM  #8190
  TOK_PUSH_VAR_LOAD  6
  TOK_FOR
  TOK_CNJUMP  jump_lbl_6
jump_lbl_7:
; LINE 13
  TOK_VAR_LOAD  2
  TOK_PUSH_VAR_LOAD  7
  TOK_ADD
  TOK_SADDR
  TOK_1
  TOK_POKE
; LINE 14
  TOK_FOR_NEXT
  TOK_CJUMP  jump_lbl_7
jump_lbl_6:
  TOK_FOR_EXIT
; LINE 15
  TOK_INCVAR  4
; LINE 16
jump_lbl_5:
; LINE 17
  TOK_FOR_NEXT
  TOK_CJUMP  jump_lbl_4
jump_lbl_3:
  TOK_FOR_EXIT
; LINE 18
  TOK_FOR_NEXT
  TOK_CJUMP  jump_lbl_2
jump_lbl_1:
  TOK_FOR_EXIT
; LINE 20
  TOK_TIME
  TOK_VAR_STORE  8
; LINE 21
  TOK_CSTRING  19, "End.", 155, "Elapsed time: "
  TOK_PRINT_STR
; LINE 22
  TOK_VAR_LOAD  8
  TOK_PUSH_VAR_LOAD  1
  TOK_SUB
  TOK_INT_STR
  TOK_PRINT_STR
  TOK_CSTRING  4, " in "
  TOK_PRINT_STR
  TOK_VAR_LOAD  0
  TOK_INT_STR
  TOK_PRINT_STR
  TOK_CSTRING  19, " iterations.", 155, "Found "
  TOK_PRINT_STR
; LINE 23
  TOK_VAR_LOAD  4
  TOK_INT_STR
  TOK_PRINT_STR
  TOK_CSTRING  9, " primes.", 155
  TOK_PRINT_STR
  TOK_END


The main part, without the PRINT's, is 107 bytes. It could be made shorter by rewriting using "REPEAT/UNTIL" instead of FOR in the outer loops (FOR loops are complicated, as they need to store the variable address, the STEP, the LIMIT, as you can use arbitrary expressions for those.

Have Fun!


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sun Oct 13, 2019 2:23 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
I pulled the trigger on the big github push. The new source code as well as a good amount of documentation is up. If you have an older checkout, you might need to blow it away and re-clone.

Feedback on the docs and/or remaining brokenness in the code is absolutely appreciated.

The link should be in my sig.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sun Oct 13, 2019 8:33 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Ah, you've erased the previous git history? That's a radical step!
Link, in case your signature changes: https://github.com/AcheronVM/acheronvm/


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sun Oct 13, 2019 11:57 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
I am relieved to see that you corrected the fig-Forth style division bug, per our private discussion a few years ago. Are you the least bit interested in any help making your new divide routine look slightly less "kludgy"?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Mon Oct 14, 2019 12:35 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
I certainly noted the issue right away in the code, but corrected it when I ported the instructions to the newest model. I didn't make any of these changes until I fully understood the problem and why/where it was caused, which took a while.

For my coding style, if some event is rare then I don't have a problem branching out and jumping back in. The vasty common path is then a cycle faster, and this is part of the inner loop of the division. But if you have a version that's faster or cleaner or smaller, yeah I'll take a look.

BigEd: The main thing is that I changed the license, but some attempted git surgery also kind of made this way the easiest to resolve. ;) Good call on having the link in the actual post, though; I didn't consider that consequence.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Mon Oct 14, 2019 8:08 am 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I know I've got into a tangle once or twice in git - it's a fine tool so long as you stay strictly on the path!

Edit: and thanks for the open source license!


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Mon Oct 14, 2019 11:09 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
I've poked around in the code a bit, but not actually tried anything yet. With so many opcodes already in use, customising the instruction set by removing unnecessary ones and adding more useful ones, especially for larger data types, seems like it will be common. But the core dispatcher logic is indeed very neat and seems worth the effort on its own.

One thing I did notice was some potential optimisations for 65C02. There is a ZP indirect mode without index, which avoids several instructions to preserve the X or Y register in certain cases, particularly for ldm/stm instructions. Even further, on the '816 you could take advantage of the 16-bit index registers to streamline various things, though there are then other things to take care of such as entry from a non-zero program bank (since VM bytecodes are loaded in data mode). It might be worth exploring these possibilities.

Looking at the opcode map, I see that much opcode space is taken up by immediate-mode versions of instructions that *could* be expressed as "load tmp, imm : op tmp", though I'm sure the use of two instructions like that would be slower as well as larger. It makes me wonder whether a reasonably quick shortcut can be implemented for this which doesn't cause such a combinatorial explosion, and thus leaves more space for extended functionality without the need to drop the "with" optimisation (or implement an escape prefix byte). Otherwise, those immediate-mode instructions would be prime candidates to drop when you want to extend functionality, since they are easy to emulate RISC-style.

Alternatively, of course, prefix bytes will probably be fairly efficient to implement. The existing dispatcher will treat them as instructions in their own right, but their length and functionality is determined by passing the second opcode byte through an additional jump table. It would be appropriate to use this mechanism for instructions that are relatively expensive to execute anyway, such as multiply/divide, floating-point, fixed-point, and memory management operations.


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Tue Oct 15, 2019 3:08 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Chromatix wrote:
Looking at the opcode map, I see that much opcode space is taken up by immediate-mode versions of instructions that *could* be expressed as "load tmp, imm : op tmp", though I'm sure the use of two instructions like that would be slower as well as larger.

It's not just code size and speed, but having to use temporary registers like that puts more memory pressure on zp space. But yeah, I'm going to quickly get to the point where the default instruction set is >128 instructions, and one of the first things to do is pick & choose the ISA. If a user is more concerned about flexibility than speed, an ISA with fewer immediate operations can be appropriate. But really, combining operations into fewer instructions, especially for inner loop work, is I believe where a lot of speed & size advantage comes from.

There is also the 256-instruction byte dispatcher, which I've tested a bit but not really used yet. That dispatcher is basically the same speed as WBIT, and you can pick & choose which instructions should have an additional integrated-'with' version. It could also be that some instructions always set rP, and don't have a non-with version so they still only take 1 opcode. We'll have to see how the practicalities pan out.


Quote:
Alternatively, of course, prefix bytes will probably be fairly efficient to implement. The existing dispatcher will treat them as instructions in their own right, but their length and functionality is determined by passing the second opcode byte through an additional jump table. It would be appropriate to use this mechanism for instructions that are relatively expensive to execute anyway, such as multiply/divide, floating-point, fixed-point, and memory management operations.

Yeah, I could imagine something like %w0000000 being a prefix to an additional 128 or even 256 instructions, dispatching on the 2nd byte, keeping the 'with' bit intact. Actually a prefix byte of 0, and another with+7bit byte would probably be easier to juggle the carry bit. By handling the prefix in the dispatcher itself, it wouldn't have to pay the cost of 2 dispatches.

However, right now there's no way to manipulate order the instructions in the opcode table, which would let you control which instructions would be prefixed or not. They simply allocate in order of declaration. One way to do this would be to have OP vs OP_HIGH or something to declare which instructions should be up in the prefixed area. But this also gets into the notion of having an actual external program involved in the build, which can manage things better than just keeping everything in ca65 macros. It'd be a big step to take, but I think it will have to expand in that direction.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Wed Oct 16, 2019 3:55 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
White Flame wrote:
... if you have a version that's faster or cleaner or smaller, yeah I'll take a look.

Understood. So, I'll pull off my shirt and dive in ... we're a mostly friendly bunch here, right? I can't tell you whether or not my initial attempt is any "faster" because it's untested and dependent on the number of '1' bits involved during the calculation. "Cleaner" is a subjective quality, but at the risk of embarrassing myself, here it is (untested!):
Code:
OP ldiv, ra, math, "rP := quotient, r[P+1] := remainder, of rP:r[P+1]/rA."
  get_ra_y
_div:
  lda 0,y ; put rA (divisor) into zptemp
  sta zptemp
  lda 1,y
  sta zptemp+1
  ldy #16  ; loop counter
:   asl 0,x  ; update rP (num:L -- quotient)
    rol 1,x
    rol 2,x  ; update r[P+1] (num:H -- remainder)
    rol 3,x
    bcs :+   ; "17th bit" set?
    lda 2,x    ; no:
    cmp zptemp ; remainder >= divisor?
    lda 3,x
    sbc zptemp+1
    bcc :++
:     lda 2,x    ; yes (to either above condition):
      sbc zptemp ; update r[P+1] (remainder) ...
      sta 2,x   
      lda 3,x
      sbc zptemp+1
      sta 3,x
      inc 0,x    ; ... and set low bit in rP (quotient)
:   dey
  bne :---   ; loop 16 times
  jmp mainLoopRestoreY
I'm pretty sure I'm eight bytes "smaller", but the significance of that is also subjective (and also highly dependent on whether or not it provides correct results)! Corrections or other comments are welcome ... but please don't make fun of my "trucker's tan".

P.S. My version quietly produces incorrect results for a divisor of 0 or any quotient that won't fit in 16 bits, but I'm pretty sure that yours is no "better" in that regard.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Thu Oct 17, 2019 2:22 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
I haven't had a chance to work the code, but it looks good. For formality's sake, you're fine with me incorporating (a working version of) it into the project under its license, with credit?

barrym95838 wrote:
I'm pretty sure I'm eight bytes "smaller", but the significance of that is also subjective (and also highly dependent on whether or not it provides correct results)!

My version patched in the error point to branch to fixup code and jumped back in. Written to include that case from the start, your version doesn't have special code for only that situation, so that's an expected benefit, which is why I asked for it. ;)

Quote:
P.S. My version quietly produces incorrect results for a divisor of 0 or any quotient that won't fit in 16 bits, but I'm pretty sure that yours is no "better" in that regard.

Correct. That's simply undefined behavior in Acheron, though I didn't actually document it. If the user cares, and having a 0 divisor is a possibility in a situation, they should check for 0 first. It's basically the same amount of error checking anyway. Or, an optional div-by-zero exception might be added to Acheron.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Last edited by White Flame on Thu Oct 17, 2019 2:25 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Thu Oct 17, 2019 4:58 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
White Flame wrote:
I haven't had a chance to work the code, but it looks good. For formality's sake, you're fine with me incorporating (a working version of) it into the project under its license, with credit?
I would be honored, sir. Yes.

I really want to throw some inputs at mine to see if it's buggy, but in the meantime I did some pencil and paper cycle counts (which may even be accurate). Our best and worst cases for the actual division loop occur with similar sets of inputs: best is 767 cycles for mine vs. 928 cycles for yours (numerator<divisor), worst is 1199 cycles for mine vs. 1047 cycles for yours (65535/1). Yours is actually 1047 cycles for any quotient == 65535, while mine is slightly better in many of those cases (as low as 959 cycles for 4294836225/65535). With a random mix of inputs, there's probably very little overall difference between the two.

I'm going to try to assemble mine this week and machine-simulate it, then get back to you.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Thu Oct 17, 2019 10:49 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Btw, I didn't mention it before, but hopefully scope-indented asm is something that sticks. ;)

Still haven't incorporated your code, but I did write up a little div test harness. It runs this
Code:
.proc test

 mgrow 3
 regnames numlo, numhi, denom

 setp32_w numlo, 4294836225
 setp16_w denom, 65535
 call dodiv

 setp32_w numlo, 123456
 setp16_w denom, 1234
 call dodiv

 retm
.endproc

to output this:
Attachment:
acheron-div-1.png
acheron-div-1.png [ 2.68 KiB | Viewed 1141 times ]

Of course, I made a setp32 instruction just for this, to load a full 32-bit immediate across 2 regs. Because that's how we roll in Acheron-land. ;) I also added the "_w" suffix for a one-line 'with' version. Not sure if the more verbose "_with" would be better, or some other syntax, but when it's used a lot I'm finding it more comfortable to use the combined form directly.

Here's the rest of the support code, which makes native calls to perform the actual character output:
Code:
 ;;; Prints the division inputs & results to the screen
.proc dodiv
 mgrow 1
 regnames temp, numlo, numhi, denom
 regnames , quotient, remainder    ; multiple names for the same regs is fine
 call_w numhi, printnum
 call_w numlo, printnum
 setp8_w temp, '/'
 call printchar
 call_w denom, printnum
 setp8_w temp, '='
 call printchar
 ldiv_w numlo, denom
 call printnum
 setp8_w temp, 'R'
 call printchar
 call_w remainder, printnum
 calln newline
 retm
.endproc


 ;;; Print the low byte of rP, surrounded in spaces
printchar:
 lobyte
 ori $2000           ; space into the high byte
 calln print_rP_hi   ; just the space
 calln print_rP      ; char + space
 ret

 ;;; Print a 4-digit hex number from rP
.proc printnum:
 mgrow 1
 movep r1 ; grab the value of rP, from wherever it was

 ; high byte
 movep r0 ; another copy, since this is destructive
 bswap    ; byte-swap the word
 tohex
 calln print_rP

 ; low byte
 tohex_w r1
 calln print_rP
 retm
.endproc


; Some native routines directly called from Acheron

print_rP:
 lda 0,x
 jsr CHROUT
print_rP_hi:
 lda 1,x
 jmp CHROUT

newline:
 lda #13
 jmp CHROUT

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Thu Oct 17, 2019 7:23 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
White Flame wrote:
Btw, I didn't mention it before, but hopefully scope-indented asm is something that sticks. ;)

Hey, when in Rome, et cetera ... :)

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Fri Oct 18, 2019 6:18 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
By the way, do you have a list of numbers to divide to test out the various edge cases?

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 65 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: