VTL02 (VTL-2 for the 6502)

BigEd · Post by **BigEd** » Wed Oct 21, 2015 9:05 am

I'm away from my computer, but I was thinkng that a push and pop would allow for rapid loops and perhaps also recursive subroutines. So
{=#
Would push the address of the current line onto a stack, and
#=}
Would pull an address and jump the interpreter to it. Line numbers not needed?

I don't suppose this will fit in three bytes...

barrym95838 · Post by **barrym95838** » Thu Oct 29, 2015 5:52 am

I finally got VTL02C up and ready for prime-time:

It's 1008 bytes in Apple ][ trim, and its execution speed is roughly equivalent to the original VTL02, using Klaus' 1000 prime spaghetti-fest (delicious, BTW, and very similar in coding style to the interpreter itself). When I crank AppleWin up to 2 MHz to normalize it with the recently reported times, it finds 1000 primes in 88 seconds. Replacing the output statements with dummy assignment statements reduces the time to 78 seconds.

vtl02ca2.asm: (36.32 KiB) Downloaded 612 times

Enjoy!

Mike B.

Klaus2m5 · Post by **Klaus2m5** » Thu Oct 29, 2015 6:28 pm

Did a quick test with 1000 primes on my emulator: 73 seconds
Replaced the 4 conditional branch multiply operations with the new then and else operator: 68 seconds

Congratulations Mike, new features and backward compatibility with better performance. Well done!

VTL02C for the Kowalski simulator, the Kingswood AS65 assembler and a copy of the original version can be found on my GitHub repository: http://github.com/Klaus2m5/VTL02

Klaus2m5 · Post by **Klaus2m5** » Thu Nov 05, 2015 5:23 pm

I am making some progress with a custom version of VTL02C, codename "Speedy Gonzales".

Code: Select all

7879
7883
7901
7907
7919
Execution time: 46.51 seconds

OK

The prime program was condensed while VTL has grown by about 300 bytes. Here is what I used for the latest test:

Code: Select all

&=1024
10 #=1000
100 N=N+2;#=200
120 N=N+4
150 !=100
200 C=!;#=N<U[Q;Q=300;V=V+2;U=V*V
300 D=5
310 A=N/D;#=%]C;D=D+2;#=D>V[400;A=N/D;#=%]C;D=D+4;#=D<V[310
400 ?=N;?=""
420 X=X-1;#=X[C
435 ?="Execution time: ";
445 ?=//100;$=46;#=%>10[465;?=0
465 ?=%;?=" seconds"
480 #=9999
500 #=200
510 N=N+1;#=200
530 N=N+2;#=150
1000 /=0;Q=400;V=5;U=25;X=1000;N=2;#=500
#=1

Changes so far:

Added a configurable timer variable {/} with 10ms increments and the required atomic variable fetch & store.
Replaced some jsr calls with inline code for skpbyte:, getbyte:, plus:, minus:.
Replaced cvbin: calls to mul: & plus: with custom inline multiply by 10 & digit adder.
Removed simulation from startup of eval:.
The mainloop uses inline code to advance to next sequential program line. find: is now only used for true branches.
Added a configurable statement delimiter {;} allowing multi statement lines. Branch to same line is now allowed.

barrym95838 · Post by **barrym95838** » Thu Nov 05, 2015 8:05 pm

Nice, Klaus. Have you thought about the possibility of converting decimal constants directly to binary in inln: ? I was thinking that you could insert a tag with the high bit set when an initial digit is entered, and accumulate a 16-bit constant that would be stored after the tag when the number is complete. All constants would then occupy 3 bytes, rather than 1 to 5+, and cvbin: would not be necessary at run-time. If the user pressed backspace in the middle of entering a number, you could just abort the line and make him/her start over, because allowing a number to be edited mid-entry could be a bit tricky.

I think that cvbin: and find: are the two biggest cycle hogs in the current versions.

Mike B.

P.S. prstr: would have to be modified to recognize the high-bit tag and jsr to prnum: before proceeding (otherwise, list wouldn't work properly).

Also, there are several places where 'c02 instructions like phy/ply would fit nicely, as well as (dp) addressing.

Skipping the {0+expression} kludge for every expression (and sub-expression) was something I should have noticed myself, but I missed it.

Another possibility could be to maintain a small cache for taken branches in the lower part of page $01. Before find: starts a search, it could check for the #= entry and its associated address in the cache, and if it's not there, do the sequential search and add the result to the cache before returning. The only recursion in VTL02 occurs in sub-expression evaluation, and it uses very little of page 1 for that.

Klaus2m5 · Post by **Klaus2m5** » Fri Nov 06, 2015 9:29 am

Thanks Mike for your valuable comments. I have made another enhancement to cvbin: now skipping the multiply and add for the first digit. I took some measurements with the original (non multi statement) version of the prime number generator:

replacing all numbers with variables: 53.09s
leave all numbers (1 & 3 digit): 53.59s
replace only the 3 digit numbers with variables: 52.19s

of course getting the value of a variable would always be slower then just copying a binary from the program stream. I am thinking of using the following scheme:

numbers up to 2 digits convert to 1 byte $80-$E3 ($80 + binary number)
numbers > 2 digits convert to 3 bytes $FF $0000-$FFFF

Regarding find:, the implementation of a branch cache could easily make small programs much slower while only improving support for very large programs. The cache management simply adds a lot of cycles while the simple find over a few lines may even be faster. So I have another idea.

The line numbers representing a branch target could be marked up with a character $60-$7F (lowercase & some symbols). On startup of program mode an additional variable array will be populated with the physical memory address of the marked lines. Whenever a character $60-$7F is referenced in a #= statement the find: is substituted by simply copying the physical address. The variable {!} also needs to be changed. It may become necessary to be able to save {!} to the additional variable array.

By the way, I have made a successfull attempt to implement a user stack, but it wasn't as usefull as I thought. For example, a conditional statement can not get its target directly from the stack, because the variable is popped off the stack wether the condition is met or not. So the user stack is currently not implemented.

barrym95838 · Post by **barrym95838** » Sat Nov 07, 2015 5:41 am

Klaus2m5 wrote:

... The line numbers representing a branch target could be marked up with a character $60-$7F (lowercase & some symbols). On startup of program mode an additional variable array will be populated with the physical memory address of the marked lines. Whenever a character $60-$7F is referenced in a #= statement the find: is substituted by simply copying the physical address. The variable {!} also needs to be changed. It may become necessary to be able to save {!} to the additional variable array ...

It will be interesting to see how well this works, in view of conditional branches and computed branches like the following in line 8910:

Code: Select all

8893 )
8895 ) PRINT DEVICE NAME FROM Y
8900 .=!
8910 #=Y*20+8920
8920 ?="ENGINE CONTROL ... ";
8930 #=.
8940 ?="S. RANGE SENSORS . ";
8950 #=.
8960 ?="L. RANGE SENSORS . ";
8970 #=.
8980 ?="PHASER CONTROL ... ";
8990 #=.
9000 ?="PHOTON TORPEDOS .. ";
9010 #=.
9020 ?="DAMAGE CONTROL ... ";
9030 #=.
9040 ?="SHIELD CONTROL ... ";
9050 #=.
9060 ?="LIBRARY COMPUTER . ";
9070 #=.

Mike B.

Klaus2m5 · Post by **Klaus2m5** » Sat Nov 07, 2015 7:38 am

Of course a computational branch wouldn't work with a physical memory address. However, the example above would also be a killer for a small branch cache as it would get trashed before you come back to the same selection. A larger cache takes too much time to search, so you end up being faster with the traditional find forward.

It would be up to the programmer to properly decide where to use the physical memory address of a targeted line. It is kind of a bypass to find:. Conditional branches alone should not be a problem as a flag can be set whenever a physical address variable is used on the right side of an equation.

Klaus2m5 · Post by **Klaus2m5** » Sat Nov 07, 2015 7:00 pm

With constant conversion on line input (+~250 bytes):

Code: Select all

7879
7883
7901
7907
7919
Execution time: 38.95 seconds

OK

I had to work arround null bytes marking the input buffer end. So the scheme is now:

Numbers up to 123 convert to 1 byte $80-$FB ($80 + binary number)
Numbers > 123 convert to 3 bytes $FC-$FF $0101-$FFFF
The marker's ($FC-$FF) bit0 = 0 = low byte $00, bit1 = 0 = high byte $00

Still need to fix print string to not use the conversion. The bug causes loss of leading zeros and overflow above 65535 in strings.

barrym95838 · Post by **barrym95838** » Sun Nov 08, 2015 2:12 am

Be careful, Klaus! If you get too addicted to execution speed, you may end up writing a resident JIT compiler for us!

Mike B.

P.S.

Quote:

Still need to fix print string to not use the conversion. The bug causes loss of leading zeros and overflow above 65535 in strings.

I believe that the 8-bit Commodore products had a "quote mode" which suppressed [edit: postponed] the action of cursor control keys, allowing them to be stored in strings from the command line and viewed as special characters in program listings. Maybe it's an idea you could adapt for your little problem. Of course, it would be slightly complicated by the fact that {"} is a valid user variable, so quote mode should only be activated if the third non-space statement character is {"}!

BTW, in your quest for cycle efficiency, do you find yourself slightly regretting your support of the idea to demote the space character? Checking for spaces (to skip) steals a lot of cycles ... but it wouldn't be an issue if you compiled each statement (or just the "easy" ones) as they were entered and tacked the raw machine code to the end of each line, with a flag near the beginning for interpret/jmp! List wouldn't have to see it, because it would be hiding after the null.

Klaus2m5 · Post by **Klaus2m5** » Mon Nov 09, 2015 3:37 pm

Code: Select all

Execution time: 33.68 seconds

Found some more cycles.

Regarding the string theory: I went for your idea keeping a counter from the start of a new statement (not just a new line, but also a statement delimiter). The statement delimiter is also a valid variable, but the count to the next {"} would always be even (variable to variable). So 3 should always work. A statement delimiter after a string or an umatched closing quote (comment) is not allowed anyway. The line ends. Period!

Regarding space characters: My first idea was to simply suppress space characters on line input except in a string or a comment. But removing the spaces inline is not such a big cycle stealer. Most of the time I get away with just adding CMP #' ' BEQ loop. The LDA (at),y and INY are required anyway. If you want a fast program, you simply don't put space characters in it.

I have an intermediate version of my changes on GitHub. But beware: It is work in progress!

Klaus2m5 · Post by **Klaus2m5** » Mon Nov 16, 2015 9:29 pm

I finally implemented my solution to improve long range goto and gosub.

The variable {=} is now asigned to gosub and return. {==...} is a gosub and pushes the physical address of the next program line on a 16 deep word stack. {#==} is a return. The {=} on the right hand side of the equation does not actually pop the return address of the stack, but it inserts a special line number (65280+'=') into the equation. The actual pop is only done, when the line number is to be stored to {#}. The return stack is located in the hardware stack area from $140 to $15F.

A label array for 31 line addresses is located at $100-$13D. Lower case characters and symbols $60-$7E address the array. Same as with {=} only a special line number of 65280 + ASCII value is inserted. You could even create a switch case with it as {#=a+2} would actually generate the same line number as {#=c}. The lable array is cleared on reset and when a program line is added, replaced or deleted. The label array is populated when a direct statement is executed. A character >$60 in column 1 signals the line whose address is to be loaded to the corresponding location in the label array.

So a revised version of my prime number program looks like this:

Code: Select all

&=1024
 10 /=0;Q=d;V=5;U=25;X=1000
 20 N=2;==b
 30 N=N+1;==b
 40 N=N+2;==b
a100 N=N+2;==b
 120 N=N+4;==b
 150 #=a
b200 #=N<U[Q;Q=c;V=V+2;U=V*V
c300 D=5
 310 A=N/D;#=%]=;D=D+2;#=D>V[d;A=N/D;#=%]=;D=D+4;#=D<V[310
d400 ?=N;?=""
 420 X=X-1;#=X[=
 435 ?="Execution time: ";
 445 ?=//100;$=46;#=%>10[465;?=0
 465 ?=%;?=" seconds"
#=1

There is only a small gain in performance for a small program. The label array should be used for far away (many lines to be skipped) and frequently used lines to be most effective. It is actually faster to use the line number instead of the label array, if you want to go to the beginning of the same line again.

I have now almost filled 2k. The only thing left to do for me is a bit of cleanup and better documentation.

Klaus2m5 · Post by **Klaus2m5** » Thu Dec 10, 2015 12:01 pm

I managed to add program load & save as supported in my emulator, error messages and syntax checking with message. Syntax is checked during line entry and the offending character is marked up in the error message. The example below shows an error 245 (missing closing parenthesis) and points to the unmatched opening parenthesis:

Code: Select all

10 A=(1+(2+(3))
     ^
Error 245 in line 10

The error code is stored with the line and is also shown during LIST (0) and if you try to run a program with syntax errors. The error can be cleared by entering a correct (replace) or empty program line (delete) or by issuing a NEW command (&=672).

The syntax check is also used to strip space characters, convert decimal numbers to binary constants and to expand statements like N+1 to N=N+1.

oscarv · Post by **oscarv** » Sat Jul 09, 2016 1:05 am

Hi,

Just for the record, I added VTL-02 to my KIM Uno (a pocket-sized fake KIM-1). Thanks for a true 6502 coding gem!
Here is a pointer to the project and the VTL-02 firmware update:

http://obsolescence.wix.com/obsolescenc ... mary/c1uuh
https://hackaday.io/project/4802-kim-un ... -1-replica

Kind regards,

Oscar.

barrym95838 · Post by **barrym95838** » Sun Jul 10, 2016 4:00 am

Very nice, Oscar! Are you using version C? It doesn't have all of the speed optimizations explored by Klaus, but it is smaller than "speedy gonzales", faster than version B, and more capable than versions A and B.

BTW, you speak German, right? Why do you think that German speakers seem to like VTL-02 more than anyone else here?

Mike B.

VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)

Re: VTL02 (VTL-2 for the 6502)