Proxy wrote:
i mean is that AND really that bad? those few cycles will add up sure, but it could be worse!
You're right - It's not
that bad, but like the extra cycle on the LDA it sort of makes me feel a bit "
Grr" thinking it should have been better. Not just that but removing the AND will save 3 bytes - and yes, some will point out that I've said very recently that you should use that RAM, but in this case for reasons of "architecture" I want my BCPL Bytecode VM to live entirely inside a 16KB region of RAM. Right now I have about 100 bytes free... And because this code is in-lined with every one of those 255 opcodes (sometimes twice), saving 3 bytes here will save me at least 765 bytes overall which i can use to speed something else up...
It's also annoying when some of the opcodes take less cycles than the dispatcher.... Here is an example
Code:
.proc ccA1
.a16
.i16
inc regA+0 ; Add 1 - ie. increment. Special case to go faster
beq :+
nextOpcode
: inc regA+2
nextOpcode
.endproc
The "add 1" opcode is used in almost every single FOR loop...
However, it did help me achieve my objective of letting me run a high level language on the '816 and also run the compiler directly on it too. Even if it's is a shade slow. But how slow? It might be nice one day to run the benchmarks I recently run under Basic and BCPL with a C compiler for the '816 ...
Quote:
look at my own bytecode VM for the 65816
(SW32VM) for example...
because i'm dumb and wanted to make the whole thing relocatable (which it isn't anymore anyways) i didn't want to make use of indirect jumps and instead instructions are choosen by just going through a chain of
DEC A instructions, checking if A is 0, if not continue to the next, if it is execute the selected instruction.
it's not fast at all and means instructions further down take longer to execute by default. i really need to rewrite that someday.
That's quite something!
Quote:
but on the other hand i'm also not really making use of indirect addressing in general. because the Index Registers are 16-bit wide, i've opted to use X as the lower 16-bits of the PC, and the Data Bank Register as the upper 8-bits.
this means i load instructions by doing LDA a:$0000,X
and incrementing the PC is just INX followed by a BNE in case it rolled over and the Data bank needs to be adjusted. though i still keep a copy of the PC in memory in case X gets muddled.
in hindsight, it might've been much easier and faster to just use indirect addressing.
again the entire thing is ripe for an almost complete rewrite...
but maybe there is something in there that could help you, or make you realize that it probably isn't worth trying to hyper optimize every last line of code...
I will have a deeper look, thanks. I have also been guilty of the "premature optimisation" scenario too - details here
https://projects.drogon.net/ruby816-premature-optimisation-and-all-that/Trying to get back on-topic.. One reason I went down the path of BCPL was because I think we have lost all the C compilers that once did run directly on the 6502 (and 816). And by lost I mean no public source code available to let us port them to new systems. I used Aztec C on the Apple II back then. The Apple IIgs has a Pascal compiler and a C compiler (written in Pascal) that runs on the system. There are a few ones that run on the BBC Micro but what else?
We have to cross compile now. Not always what I want to do at least.
So BCPL lets me edit, compile and run programs directly on my system without using BASIC. If only it were a shade faster...
Cheers,
-Gordon
_________________
--
Gordon Henderson.
See my
Ruby 6502 and 65816 SBC projects here:
https://projects.drogon.net/ruby/