While i was pondering what new project to work on for my 65816 SBC i randomly remembered that SWEET16 is a thing and wondered if anyone ported it to the 65816 yet...
around 2 seconds later by brain actually turned on and i realized that a 16-bit VM like SWEET16 makes very little sense on an already 16-bit CPU.
so then i thought, why not take the base idea of SWEET16 (a RISC-like VM) and make it 32-bit?
and because i'm not good with names i just started calling it "SWEET32" even though it's only inspired SWEET16, and not actually compatibile with it...
also, i actually lost ALL the work i did on this last year and only because Notepad++ keeps 2 backups of every file you ever opened was i able to recover it completely. so thanks Notepad++!
anyways, back to the VM.
SWEET32 (or SW32 for short) is a very RISC like architecture:
- 15x 32-bit Registers named R1-R15, with R0 being a constant 0
- a 32-bit Program Counter (functionally only 24-bit)
- an 8-bit Status Register called SR, holding the ALU flags "Zero", "Negative", and "Carry". in addition to 4 user flags F4-7
- full 24-bit Addressing (though Load/Store Instructions also be limited to the current bank)
Most Instructions are 2 bytes large, except for the "Load Immidate" Instructions, which are either 4 or 6 bytes in size.
Speaking of Instructions, here's all of them, with the formatting explained:
Ra = Source Register A
Rb = Source Register B
Re = Destination Register
Rx = Source/Destination Register combo
Code:
<------------------------------------------------->
Branch on Clear - BNv k
Branch on Set - Bv k
Branches if the specified bit "v" in the SR is set/cleared.
"k" is the address to branch to, encoded as a 8-bit signed offset multiplied by 2, giving it a -128 to +127 Word range.
<------------------------------------------------->
Jump and Link - JAL Re, Ra
Jumps to the Address in the Source Register "Ra" and stores the address of the following instruction in the Destination Register "Re".
using R0 as the Destination Register makes JAL function like a regular Jump,
using R0 as the Source Register won't modify the PC, bascially just loading the Address of the next Instruction into a Register
<------------------------------------------------->
Add - ADR Re, Ra, Rb <>Flags: Z C N
Subtract - SBR Re, Ra, Rb <>Flags: Z C N
Logic AND - ANR Re, Ra, Rb <>Flags: Z - N
Logic OR - ORR Re, Ra, Rb <>Flags: Z - N
Logic XOR - XOR Re, Ra, Rb <>Flags: Z - N
Logic Shift Left - SFL Re, Ra <>Flags: Z C N
Logic Shift Right - SFR Re, Ra <>Flags: Z C 0 (N Flag is always cleared)
Rotate Left - RLR Re, Ra <>Flags: Z C N
Rotate Right - RRR Re, Ra <>Flags: Z C N
Arithmetic/Logic Instructions all function pretty much the same: Re = Ra <operation> Rb
the Shifts and Rotates are almost the same as the 65xx versions, except that the Source and Destination Registers can be different.
but i'll go into some detail about the flags!
the Zero Flag (Z) is functionally identical to the 65xx Z Flag, if the result of an operation is 0, it's set, otherwise cleared
the Negative Flag (N) is also like the 65xx N Flag, it just copies the MSB of the result into itself
The Carry Flag (C) in the Shift and Rotate Instructions works the same as in the 65xx Instructions, but for Add and Subtract it's slightly different.
Specifically Add and Subtract don't use the Carry Flag as an input, only as output. Also Subtract sets the Carry when a Borrow occurs, which is the opposite of how SBC works on the 65xx
<------------------------------------------------->
Add Immediate - ADI Rx, k ; Flags: Z C N
Takes the 8-bit constant "k", sign extends it to 32-bits, and adds it to the combined Source/Destination Register.
Flags are updated exactly like the ADR Instruction would.
<------------------------------------------------->
Load Byte (Signed) - LB Re, Ra
Load Byte (Unsigned) - LBU Re, Ra
Load Word (Signed) - LW Re, Ra
Load Word (Unsigned) - LWU Re, Ra
Load Long - LL Re, Ra
The Source Reginster contains the Address with the value being read getting loaded into the Destination Register
<------------------------------------------------->
Store Byte - SB Ra, Rb
Store Word - SW Ra, Rb
Store Long - SL Ra, Rb
Similar to the Loads, Source register A contains the Address, and Source Register B the value to write to Memory
<------------------------------------------------->
Set Bit - SET Rx, k
Clear Bit - CLR Rx, k
These have 2 different ways they can work, if "Rx" is R0, the Instructions function like REP/SEP from the 65816,
taking the 8-bit constant "k" and using it as a mask to select which bits in the SR should be set/cleared.
If "Rx" is any register besides R0, the now 5-bit constant "k" selects which bit (0-31) to set/clear in the specified Register.
btw these Instructions are the only ones that can modify the User Flags F4, F5, F6, and F7.
<------------------------------------------------->
Load Word Imm. (Signed) - LWI Re, k
Load Word Imm. (Unsigned) - LWIU Re, k
Load Long Immediate - LLI Re, k
Also simple, LWI takes a 16-bit immediate value, sign extends it to 32-bits, and loads it into the Destination Register
LWIU does the same, except it zero extends the 16-bit value instead. LLI just loads a full 32-bit immediate value into the Destination Register
<------------------------------------------------->
Return to 65816 Mode - EXIT
This Instruction Exits the VM and resumes regular 65816 program execution after the sw32_execute function
next up, the functions that the VM requires to work:
there are 2 main functions, (both are called with JSL, and expect 8-bit A, and 16-bit X/Y):
"sw32_init" - clears all Registers, and sets the Control Byte to the value in A
Currently the Control byte's only used bit is bit 7, which determins the address width for Load/Store Instructions.
if bit 7 is cleared Load/Store Instructions are limited to 16-bit addressing, if set they are 24-bit instead.
"sw32_execute" - executes SW32 code, it starts at the address given by the X and Y Registers. X = Low Word, Y = High Word (High Byte is ignored)
The function only returns when an EXIT instruction is executed.
"sw32_print" - prints out the contents of all Registers in a nice and tidy format:
Code:
R1: $00000000
R2: $00000000
R3: $00000000
R4: $00000000
R5: $00000000
R6: $00000000
R7: $00000000
R8: $00000000
R9: $00000000
R10: $00000000
R11: $00000000
R12: $00000000 PC: $00000000
R13: $00000000
R14: $00000000 7654NCZ
R15: $00000000 SR: 0000000
in order for sw32_print to work it needs a user provided function to print a single ASCII Character to whatever output the user might have
said function has to be called "sw32_print_char", return with RTL and has to assume A is 8-bits wide, and X/Y are 16-bits wide.
in addition to sw32_print, 2 extra functions for printing hexadecimal values are also given (JSL/RTL, 8-bit A, 16-bit X/Y):
- sw32_print_h8 - Prints the 8-bit value in A
- sw32_print_h32 - Prints the 32-bit value that was pushed to the stack before calling the function (push the high word first)
There is no "sw32_print_h16" function, so if needed the user has to implement their own.
As a test i wrote a small bubble sort implementation:
Code:
sw32_sort:
LLI R15, array_ptr ; Get the Address of the Array to be sorted
LWI R14, element_count-1 ; Get the amount of elements in the Array (minus 1)
@outer_loop:
CLR SR, F7 ; Clear the "swapped" Flag
ADR R13, R15, R0 ; Save a Copy of the Array Address into R13
ADR R12, R14, R0 ; Save a Copy of the Element count into R12
@inner_loop:
LL R1, R13 ; Load a Value from the Array
ADI R13, 4 ; Increment the Pointer to the next element
LL R2, R13 ; And Get a second Value from the Array+1
SBR R0, R2, R1 ; Compare them (R1 - R2)
BNC @no_swap ; If R1 > R2, swap them
SL R13, R1 ; Store R1 to Array+1
ADI R13, -4
SL R13, R2 ; Store R2 to Array
ADI R13, 4 ; Move the Array Pointer back to where it was before
SET SR, F7 ; And Set the "swapped" Flag
@no_swap:
ADI R12, -1 ; Decrement the Element counter
BNZ @inner_loop
BF7 @outer_loop
EXIT
the entire function takes up 44 Bytes (plus the 1.3kB for the VM to run) and on my 20MHz SBC took just below 2 minutes to sort an array of 1000 random 32-bit values.
I Uploaded everything to github, so experimenting with it should be pretty simple:
https://github.com/ProxyPlayerHD/SWEET32-65816Overall i really doubt this will ever be used for anything serious, but it would be interesting to see if a compiler made for the VM would result is more compact code compared to the native 65816 (though it would very likely run much much slower due to the emulation overhead)
anyways, this was quite fun to write and i'm pretty proud of it! so tell me your thought below.