6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 4:00 am

All times are UTC




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Fri Mar 15, 2019 10:44 pm 
Offline

Joined: Wed Aug 03, 2016 9:22 am
Posts: 5
Hi, everyone
I wrote SWEET16 like virtual microprocessor (is it correct to call like that?) in 6502 assembly. It's name is VIRTUAL16. For anyone who interested it's located here: https://github.com/NullMember/VIRTUAL16
It's two week project for spare times. I have never used/seen any microcomputer contains 6502 before. Because of that it could contain bugs and non-optimized code. I make it for fun but i'm always trying to improve it. I would be glad if someone could try and give feedback about optimization, improvements and bugs. I want to keep it under 1kB (currently around 1040 bytes). So it could be barrier for some features.


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 15, 2019 11:00 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Welcome, NullMember, and thanks for sharing your project. A 1k target for the VM is quite an interesting constraint - it should keep the machine relatively simple. Here's a helpful link to your example programs.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 5:25 pm 
Offline

Joined: Wed Aug 03, 2016 9:22 am
Posts: 5
BigEd wrote:
Welcome, NullMember, and thanks for sharing your project. A 1k target for the VM is quite an interesting constraint - it should keep the machine relatively simple. Here's a helpful link to your example programs.


Thanks for your attention @BigEd. I've removed CLR (CLear Register) and ASL (Arithmetic Shift Left) instructions today. Because 16-bit immediate MOV instruction can do what CLR do and LSL do same thing what ASL do. I've added two new instructions; CMP (CoMPare) and RJSR (Real Jump SubRoutine). CMP compares two instructions and writes result to R13. RJSR can execute real 6502 instructions without returning from VIRTUAL16 mode. I think instruction set is finalized with these instructions. VIRTUAL16 is now 1023 byte. If anyone want to look at instruction set here is a link


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 5:38 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Looks nice. I like 16 bit machines with 16 registers! (see OPC5 and OPC6)

So, is CMP just like SUB but it puts the result in a fixed place?

I think I would have CMP use the LSB destination of *MUL instructions, to minimise the number of fixed-use registers. But maybe there's a reason not to?


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 6:20 pm 
Offline

Joined: Wed Aug 03, 2016 9:22 am
Posts: 5
Yes, CMP works like that. There is no special reason for it. Actually your way is better but what about MUL result to R12(LSB)-R13(MSB) and CMP result to R13? I think it's more sense because PC and SR-SP is in rightmost to left order.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 6:37 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Good project.

I've recently been doing a bit of sweet16 coding - but I started by re-implementing Woz's interpreter to take speed it up a little and take advantage of some 65C02 opcodes and remove the dependency on the memory position thing. I did toy with expanding it, but other than use an un-used op-code as a debug print I found i didn't need to.

Do you have plans to write something big in it? I was using it because I had a task that involved a lot of 16-bit manipulations and nothing more than add/subtract, so it worked well for that. (I need a malloc, free and merge library for a project I'm working on and started it in 65c02, but decided it would be easier in sweet16 - and it was!)

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 7:15 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
TBH, the way I'll allocate the registers would be a bit like we did for OPC: high numbers for fixed purposes, low numbers for scratch, then the middle ones get used as locals or parameters. So, I'd use R0 for CMP, and R0 and R1 for *MUL, because that would be minimally disruptive.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 7:23 pm 
Offline

Joined: Wed Aug 03, 2016 9:22 am
Posts: 5
Thanks Gordon. If you have any advice for speeding up (especially fetch-decode) things please let me know. But I want to keep it in 6502 assembly and under 1k. That's maybe a problem.
VIRTUAL16 don't have memory restrictions. You can place it anywhere in memory. Only ~48 bytes must be in zeropage.

As a hardware I'm using KIM-1 simulator based on arduino. It's called KIMUno. It's not full emulator because RIOT chips are not emulated. KIM-1 has DISP routine for showing 0xF9-0xFA-0xFB contents on screen. That's why I implemented RJSR instruction today. If you want to use low-level 6502 routines, it's possible to call them without returning from VIRTUAL16 mode.

VIRTUAL16 has it's own stack pointer. Today I found with a little hack you can move VIRTUAL16 stack pointer to anywhere in memory. Maybe I can try to implement multitasking in it. But it could just proof of concept because I don't have a real 6502 hardware and I don't know how interrupts work in 6502.

BigEd wrote:
TBH, the way I'll allocate the registers would be a bit like we did for OPC: high numbers for fixed purposes, low numbers for scratch, then the middle ones get used as locals or parameters. So, I'd use R0 for CMP, and R0 and R1 for *MUL, because that would be minimally disruptive.


I'll take this into account before changing anything. Your feedback is really helpful. Thanks for your advices


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 8:04 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
NullMember wrote:
Thanks Gordon. If you have any advice for speeding up (especially fetch-decode) things please let me know. But I want to keep it in 6502 assembly and under 1k. That's maybe a problem.
VIRTUAL16 don't have memory restrictions. You can place it anywhere in memory. Only ~48 bytes must be in zeropage.


In my implementation of sweet16, I sacrificed RAM for speed (because I have relatively plenty of RAM). Woz has a single byte-wide table of the function addresses (low byte), the high byte is fixed which is why all the functions need to be in the same page (any page, but all in the same page). I used 2 tables, so an extra 16 bytes - actually double as I did the same thing for the branch opcodes too. I also used self-modifying code (I have no ROM in my 6502 SBC), so I fetch the addresses and stuff them into the 2 bytes after a jmp instruction rather than push on the stack and rts...

Code:
        lda     jumpTableL,y
        sta     wooly+1
        lda     jumpTableH,y
        sta     wooly+2

wooly:  jmp     $FFFF   ; Modified


I could save 2 bytes there and thus make it a few cycles faster if I assembled the dispatch loop into zero page...

NullMember wrote:
As a hardware I'm using KIM-1 simulator based on arduino. It's called KIMUno. It's not full emulator because RIOT chips are not emulated. KIM-1 has DISP routine for showing 0xF9-0xFA-0xFB contents on screen. That's why I implemented RJSR instruction today. If you want to use low-level 6502 routines, it's possible to call them without returning from VIRTUAL16 mode.

VIRTUAL16 has it's own stack pointer. Today I found with a little hack you can move VIRTUAL16 stack pointer to anywhere in memory. Maybe I can try to implement multitasking in it. But it could just proof of concept because I don't have a real 6502 hardware and I don't know how interrupts work in 6502.


You could implement multitasking without interrupts - simply run for a number of cycles, then swap registers/pc/stack for the next process, and so on. You could keep what's essentially a linked-list of processes in RAM, each process having space to store its own register set in, although the overhead of copying from ZP to store, and back again might be too high. This is more or less how the Transputer did it in it's microcode engine. It would count cycles, then at the next jump instruction (or a channel read/write) it would task switch. It didn't save registers though, so after a jmp, you could not guarantee register contents - which was mostly OK due to the way it worked as a 3-deep stack.

NullMember wrote:
BigEd wrote:
TBH, the way I'll allocate the registers would be a bit like we did for OPC: high numbers for fixed purposes, low numbers for scratch, then the middle ones get used as locals or parameters. So, I'd use R0 for CMP, and R0 and R1 for *MUL, because that would be minimally disruptive.


I'll take this into account before changing anything. Your feedback is really helpful. Thanks for your advices


Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 16, 2019 11:13 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Nice, I've recently been working on getting the revised version of my own VM released as well, but it's really bloaty, approaching a whopping 2k! ;)

Regarding the actual instruction dispatch, I compiled as many strategies as I could come up with here. I'm using one based off the 128-instruction 9-cycle version, where opcodes are even numbers:

Code:
  sta :+ +1
: jmp (table)


As a purely linguistic point of feedback, when you jsr into 6502 mode, I'd suggest calling it "native" instead of "real".

Welcome!

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 18, 2019 9:41 am 
Offline

Joined: Sat Nov 11, 2017 1:08 pm
Posts: 33
A few optimisation options if you are interested.

Inline JSR SAVE and RESTORE as they are only used once. Saves 8 bytes

Code:
INCPC2:
    INC PC          ;OUR VIRTUAL16 USES TWO-BYTE INSTRUCTIONS
    BNE NEWINSTR ;**** change to BNE FETCH saves cycles in the common case
    INC PC+1        ;IF PAGE CROSSED
NEWINSTR:
    JMP FETCH       ;IF EXECUTION ENDED FETCH NEW INSTRUCTION


In decode of of branching there is no need for the TAY if you use CPY instead. saves 1 byte
Could put LDA LB,X before BCS INSBNM1 and is it a common expression (
saves 6 bytes )

A the end of the branch decoding make the code fall into INSBPL ( saves 2 bytes)


In the branch decoding for INSBRA branch directly to EXECUTEBRN ( save 3 bytes)

Make the INSNUL any other RTS , saves a byte

Might be worth reordering the branch opcodes so that BEQ is tested first as that is the most likely branch to occur in code.

In ExecuteBRN I'm not sure it is exactly what you intended. The Carry from the Low byte to high byte for the PC becomes corrupted by the CPX

INSBCS and INSBCC , instead of PHA PLP which is slow just use AND #Carry and BEQ /BNE

A few places have JSR zzz then RTS , this can be changed to JMP zzz ( save 3 bytes)

24 bytes saved


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 20, 2019 11:23 am 
Offline

Joined: Wed Aug 03, 2016 9:22 am
Posts: 5
Sorry for late reply

White Flame wrote:
As a purely linguistic point of feedback, when you jsr into 6502 mode, I'd suggest calling it "native" instead of "real".


I like it. I've changed RJSR to NJSR. Thanks for advice!

dp11 wrote:
A few optimisation options if you are interested.


I've implemented all of your optimisations. Also found some other optimisations. Now whole VIRTUAL16 takes 980 bytes plus ~48 in zero page.

dp11 wrote:
In ExecuteBRN I'm not sure it is exactly what you intended. The Carry from the Low byte to high byte for the PC becomes corrupted by the CPX


It's for testing is branch offset is positive or negative. I've noticed after your message, it's completely wrong. I've changed it now. Should work now. Thanks for your advices!


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 20, 2019 9:01 pm 
Offline

Joined: Sat Nov 11, 2017 1:08 pm
Posts: 33
A little bit more

Change from this

Code:
   STA STATUS
    CLD
    RTS
INITSTACK:
    LDA #0x00       ;LOAD LSB OF 0x0100
    STA SPPAGE      ;STORE IT. WE NEED THIS BECAUSE OF STACK
    LDA #0x01       ;LOAD MSB OF 0x0100
    STA SPPAGE+1    ;STORE IT
    TSX             ;TRANSFER STACK POINTER TO X
    TXA             ;TRANSFER STACK POINTER X TO A


to

Code:

   STA STATUS
    CLD
  ;  RTS ;rts left over form inlining , so should be removed
INITSTACK:
    LDX #0x00       ;LOAD LSB OF 0x0100
    STX SPPAGE      ;STORE IT. WE NEED THIS BECAUSE OF STACK
    INX            ;LOAD MSB OF 0x0100 ; *** saves a byte
    STX SPPAGE+1    ;STORE IT
    TSX             ;TRANSFER STACK POINTER TO X
    TXA             ;TRANSFER STACK POINTER X TO A


saves two bytes

and

Code:
INCPC2:
    INC PC          ;OUR VIRTUAL16 USES TWO-BYTE INSTRUCTIONS
    BNE NEWINSTR
    INC PC+1        ;IF PAGE CROSSED
NEWINSTR:
    JMP FETCH       ;IF EXECUTION ENDED FETCH NEW INSTRUCTION


to

Code:
INCPC2:
    INC PC          ;OUR VIRTUAL16 USES TWO-BYTE INSTRUCTIONS
    BNE FETCH
    INC PC+1        ;IF PAGE CROSSED
    BNE FETCH       ;IF EXECUTION ENDED FETCH NEW INSTRUCTION


Saves a byte

Line 677 JMP NEWINSTR change to JMP FETCH : performance improvement

Place INSPOP: code after INSMULRESTORE: and remove the JMP INSPOP, saves 3 bytes

move INSINC code to after INSMOVRMI. Then the JMP INSINC can be removed. The INSMOVMRI LDY#0 can then be deleted and the JMP can be to the LDY in INSMOVRMI; saves 5 bytes

Inline CLRMULRST as it is only used once : saves 4 bytes

Move INSSUB: code to after INSCMP and remove the JMP : saves 3 bytes

change :
Code:
INSBEQ:
    ORA HB,X        ;(BOTH BYTES)
    BEQ EXECUTEBRN  ;BRANCH IF SO
    RTS


to
Code:
INSBEQ:
    ORA HB,X        ;(BOTH BYTES)
    BNE to some other RTS; 
    ;RTS remove RTS and fall into code below

one byte saved

in
Code:
INSMULSRCTST:       ;TEST IF SOURCE NEGATIVE
    LDA HB,X
    BPL INSMULDSTTST;IF NEGATIVE CONVERT IT TO POSITIVE ELSE JUMP TO DESTINATION TEST
    EOR #0xFF       ;FLIP ALL BITS
   ; CLC  *** remove CLC that does nothing.
    STA HB,X
    LDA LB,X
    EOR #0xFF
    CLC
    ADC #0x01

1 byte saved

Code:
INSMULDSTTST:       ;TEST IF DESTINATION NEGATIVE
    LDA HB,Y
    BPL INSMULCOPY  ;IF NEGATIVE CONVERT IT TO POSITIVE ELSE JUMP TO MULTIPLICATION ROUTINE
    EOR #0xFF
   ; CLC *** removed CLC that does nothing.
    STA HB,Y
    LDA LB,Y
    EOR #0xFF
    CLC
    ADC #0x01


one byte saved

Change
Code:
INSMULSIGN:
    PLP             ;PULL SR WE PUSHED BEFORE
    BPL INSMULRESTORE;IF SIGN OF RESULT IS POSITIVE (+ . + OR - . -)
    LDA #0x00       ;ELSE TURN POSITIVE RESULT TO NEGATIVE
    SEC
    SBC ML
    STA ML
    LDA #0x00
    SBC ML+1
    STA ML+1
    LDA #0x00
    SBC MH
    STA MH
    LDA #0x00
    SBC MH+1



to

Code:
INSMULSIGN:
    PLP             ;PULL SR WE PUSHED BEFORE
    BPL INSMULRESTORE;IF SIGN OF RESULT IS POSITIVE (+ . + OR - . -)
    LDA #0x00       ;ELSE TURN POSITIVE RESULT TO NEGATIVE
    TAX
    SEC
    SBC ML
    STA ML
    TXA
    SBC ML+1
    STA ML+1
    TXA
    SBC MH
    STA MH
    TXA
    SBC MH+1


saves 2 bytes

23 bytes saved.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 23, 2019 11:50 pm 
Offline

Joined: Sat Nov 11, 2017 1:08 pm
Posts: 33
A few more bytes :

In DECODEREG remove the last LSR and ASL and replace with AND #0x1f as it is faster

In DECODEIMM change

Code:
IMMABSBRN:
    CMP #0xE0       ;IF INSTRUCTION IS ABSOLUTE
    BCS DECODEABS
    CMP #0x50       ;IF INSTRUCTION IS BRANCH
    BCS DECODEBRN   ;ELSE INSTRUCTION IS IMMEDIATE
DECODEIMM:          ;DECODE IMMEDIATE MODE ADDRESSING
    CMP #0x40       ;IS WORD?
    BCS DECODEIMMWORD;BRANCH TO WORD ROUTINE
    CMP #0x30       ;IF BYTE
    PHP             ;PUSH COMPARE RESULT
    AND #0x0F       ;GET REGISTER
    ASL             ;DOUBLE IT FOR PROPER REGISTER ADDRESS
    TAY             ;TRANSFER REGISTER ADDRESS TO Y
    TXA             ;TRANSFER IMMEDIATE VALUE TO A
    PLP             ;PULL MSB COMPARE RESULT FROM STACK
    BCC DECODEIMMLSB;BRANCH TO LSB ROUTINE
    INY             ;IF MSB INCREASE REGISTER ADDRESS
DECODEIMMLSB:
    STA LB,Y        ;STORE IMMEDIATE VALUE TO REGISTER
    RTS


to

Code:
IMMABSBRN:
    CMP #0xE0       ;IF INSTRUCTION IS ABSOLUTE
    BCS DECODEABS 
    ASL             ;DOUBLE IT FOR PROPER REGISTER ADDRESS
    CMP #0x50<<1       ;IF INSTRUCTION IS BRANCH
    BCS DECODEBRN   ;ELSE INSTRUCTION IS IMMEDIATE
DECODEIMM:          ;DECODE IMMEDIATE MODE ADDRESSING 
    CMP #0x40<<1       ;IS WORD?
    BCS DECODEIMMWORD;BRANCH TO WORD ROUTINE
    CMP #0x30<<1       ;IF BYTE
    ;;;PHP             ;PUSH COMPARE RESULT
    AND #0x1E       ;GET REGISTER
    ADC #0       ; add in carry
    TAY             ;TRANSFER REGISTER ADDRESS TO Y
  ;;;  TXA             ;TRANSFER IMMEDIATE VALUE TO A
  ;;  PLP             ;PULL MSB COMPARE RESULT FROM STACK
  ;;  BCC DECODEIMMLSB;BRANCH TO LSB ROUTINE
   ;; INY             ;IF MSB INCREASE REGISTER ADDRESS
DECODEIMMLSB:
    STX LB,Y        ;STORE IMMEDIATE VALUE TO REGISTER
    RTS

saves 4 byte

and In DECODEIMMWORD change

Code:
DECODEIMMWORD:
    PHA             ;PUSH INSTRUCTION BYTE TO STACK
    LDY #0x02       ;LOAD #0x02 TO Y
    LDA (PC),Y      ;GET MSB OF IMMEDIATE VALUE
    STA TEMP        ;STORE IT TO TEMP
    PLA             ;PULL INSTRUCTION BYTE FROM STACK
    AND #0x0F       ;GET REGISTER
    ASL             ;DOUBLE IT FOR PROPER REGISTER ADDRESS
    TAY             ;TRANSFER REGISTER ADDRESS TO Y
    TXA             ;TRANSFER IMMEDIATE VALUE LSB TO A
    STA LB,Y        ;STORE TO REGISTER LSB
    LDA TEMP        ;LOAD IMMEDIATE MSB
    STA HB,Y        ;STORE REGISTER MSB
    BCC ABSIMMEND

to
Code:
DECODEIMMWORD:
    AND #0x1E       ;GET REGISTER
    PHA             ;PUSH INSTRUCTION BYTE TO STACK
    LDY #0x02       ;LOAD #0x02 TO Y
    LDA (PC),Y      ;GET MSB OF IMMEDIATE VALUE
    STA TEMP        ;STORE IT TO TEMP
    PLA             ;PULL INSTRUCTION BYTE FROM STACK
    TAY             ;TRANSFER REGISTER ADDRESS TO Y
    STX LB,Y        ;STORE TO REGISTER LSB
    LDA TEMP        ;LOAD IMMEDIATE MSB
    STA HB,Y        ;STORE REGISTER MSB
    BCC ABSIMMEND

Saves 2 byte

and

Code:
DECODEBRN:          ;DECODE BRANCH MODE ADDRESSING
    STX TEMP        ;STORE BRANCH VALUE TO TEMPORARY LOCATION
    TAY             ;BACKUP INSTRUCTION BYTE
    AND #0x0F       ;EXTRACT REGISTER ADDRESS
    ASL             ;DOUBLE IT FOR PROPER REGISTER ADDRESS
    TAX             ;TRANSFER IT TO X REGISTER

To
Code:

DECODEBRN:          ;DECODE BRANCH MODE ADDRESSING
    STX TEMP        ;STORE BRANCH VALUE TO TEMPORARY LOCATION
    TAY             ;BACKUP INSTRUCTION BYTE
    AND #0x1E       ;EXTRACT REGISTER ADDRESS
    TAX             ;TRANSFER IT TO X REGISTER

saves a bytes

In INSSWAPBYTE probably has a bug and and be improved to

Code:
INSSWAPBYTE:
    LDY LB,X
    LDA HB,X
    STA LB,X
    STY HB,X
    RTS

saves 2 bytes


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: