dmsc wrote:
For the editor this is not easy, as it would mean instrumenting it to do some particular task. But for the sieve benchmark, calculating first 1899 primes, those are the runtime statistics:
Are you using emulator trace logs to measure this? I think you should be able to manually bring the editor to some state, issue commands, and filter the trace logs to measure execution between fenceposts, say something that requires a full-screen refresh or making an insertion that moves a lot of internal data. But yeah, a full batch benchmark is certainly easier to measure, but isn't quite as "real world".
It would also be interesting to see if you would get better sieve performance by hand-writing fastbasic VM operations instead of compiling from its BASIC form. That would be more in line with how Acheron code is currently written, but if it would basically be what BASIC generates anyway, then it might not be worth the bother.
Also, does your cycle tally include the work done during printing? Though I guess with hundreds of millions of cycles executed, that doesn't really affect the percentage much.
Quote:
That would be great. The editor does not have too much hardware dependencies, it uses PRINT to output to the screen, only relies on be able to write control codes (cursor movement, insert line, delete line) and read the current cursor position.
The C64 has control codes for cursor movement, but not insert/delete lines, so a bit of fiddling will be required there.
Here's a first, nontested pass of converting the high-level FastBasic sieve
Code:
? "Starting!"
NumIter = 10
sTime = TIME
' Arrays are initialized to 0
DIM A(8190) Byte
FOR Iter= 1 TO NumIter
MSET Adr(A), 8190, 0
Count = 0
FOR I = 0 TO 8190
IF NOT A(I)
Prime = I + I + 3
FOR K = I + Prime TO 8190 STEP Prime
A(K) = 1
NEXT K
INC Count
ENDIF
NEXT I
NEXT Iter
eTime = TIME
? "End."
? "Elapsed time: "; eTime-sTime; " in "; NumIter; " iterations."
? "Found "; Count; " primes."
to low-level Acheron:
Code:
grow 8
regnames array, iter, i, k, prime, stime, etime, const1
; constants in the assembler
arrayLoc = $8000 ; using a fixed memory buffer, as opposed to DIM allocation
numIter = 9 ; 10 iterations
size = 8190
with stime
gettime ; TODO
with const1
setp 1
; for iter = numIter (down to 0)
with iter
setp numIter
iterLoop:
; Initialize the array to zero
with array
setp arrayLoc
clrmn size ; clear mem[rP to rP+size-1]
; count = 0
with count
clrp
; for i=0
with i
clrp
loopI: ; rP = i
; prime = membyte[i + array], reusing this var temporarily
ldmbr prime, array
bnz nextI
; Array entry was zero, this is a prime
; prime = 3 + (i<<1)
setp 3
addea2 i ; addea2 = add effective address offset, 2 bytes each
; for k = prime+i
movep k
add i
loopK:
; mem[array + k] = 1
with const1
stmbr array, k
; step prime
; This part is weaker than BASIC, without specific FOR/NEXT instructions
with k
add prime
cmpi16 size ; need to do a relative test, as we can overshoot the limit
bnc loopk
with count
incp
nextI:
with i
incp
case size, loopI ; can do an equality test, as we're incrementing by 1
; next iter
with iter
decloop iterLoop
with etime
gettime
sub stime
; ~59 bytes to here?
printlit
.byte "End.",13,"Elapsed time: ",0
with etime
printdec
printlit
.byte " in ",0
with iter
setp numIter
incp
printdec
printlit
.byte " iterations.",13,"Found ",0
with count
printdec
printlit
.byte " primes.",13,0
shrink 8
Main section without comments or assemble-time constants, and packed the 'with' codes together, to align more to its actually dispatched opcodes:
Code:
grow 8
regnames array, iter, i, k, prime, stime, etime, const1
gettime_with stime
setp_with const1, 1
setp_with iter, 9
iterLoop:
setp_with array, $8000
clrmn 8190
clrp_with count
clrp_with i
loopI:
ldmbr prime, array
bnz nextI
setp 3
addea2 i
movep k
add i
loopK:
stmbr_with const1, array, k
add_with k, prime
cmpi16 8190
bnc loopk
incp_with count
nextI:
incp_with i
case 8190, loopI
decloop_with iter, iterLoop
gettime_with etime
sub stime
To my count, it's 59 bytes in 24 instructions for the main non-printing portion. (regnames is a build-time naming macro, with no runtime opcode) This uses some instructions I've designed but haven't written yet, including time & printing. In comparison to your BASIC tokens it's probably fair to add those comparable instructions for this test. There's certainly some things that will be faster or slower in printing & not using DIM, but the main loop should still dominate.
This also uses 16 bytes of zp space for registers. Certainly I could reduce that by taking iter/stime/etime out of regs and into the global page, and using a single temp register for those and const1 instead. That would make the code a bit bigger but friendlier. But if this is the only thing running, and for benchmark purposes, might as well keep it optimal.