6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Apr 30, 2024 3:13 am

All times are UTC




Post new topic Reply to topic  [ 65 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
 Post subject: Re: Announce: Acheron VM
PostPosted: Sat Jul 28, 2012 9:07 pm 
Offline

Joined: Tue Jun 26, 2012 6:18 pm
Posts: 26
@White Flame
Just one thing:
I had one problem with your solution: your routine 'convertString' messes with the string pointer and expects the caller to use the returned changed string address. TBH that was not what I intended. The idea was to put a string at an address given by the caller (so you can add strings together e.g. for generating a disassembly output: MOVE #<string here>, R0). However, rewriting the convertstring routine might be a bit boring. So I thought about the following:
In 6502 assembly a routine for writing a value as hex nibbles would usually look like
Code:
      PHA
      LSR
      LSR
      LSR
      LSR
      ORA   #'0'
      CMP   #'9' + 1
      BCC   ?0
      ADC   #'a' - '9' - 2
?0:   STA   ...
      PLA
      AND   #$f
      ORA   #'0'
      ...
Theoretically, this should be a good example for using the rP. I've already written a routine for the RISC processor that will print a 16bit value as '$01ab'. But to make it more complex may I suggest the following: instead of passing a file handle as a parameter, now we will pass a pointer to a device object. This device object will have a pointer at offset $12 that contains the address of a method 'outstring' which must be called for writing the string to the device. (This is something beyond the normal capabilities of the 6502.) Another method pointer at offset $14 contains the address of the routine for writing a character (like COUT on the C64). The API looks like this:
1) convertstring
in: register 1: value
register 2: pointer to string
out: register 2: new pointer to string (pointing to the end of the string: '\0')
2) outhex16
in: register 1: value
register 2: pointer to device object
out: register 3: errorcode

If you like you can implement method 'outstring':
3) outstring
in: register 1: pointer to string
register 2: pointer to device object
out: register 3: errorcode
4) outchar
in: register 1: character
register 2: pointer to device object
out: register 3: errorcode
How you implement outchar is up to you and not part of the exercise.
Please note: registers 1 and 2 may not be destroyed (except for routine 1). In addition to the errorcode a flag will indicate whether the operation was successful or not. For example, a 68000 will use the Z-flag for this, x86 and 6502 the C-flag.
I'll give you the example in RISC code soon. All I can say so far is that convertstring + outhex16 + outstring take $5a (90) bytes. Should be easy for you to beat this.
Cheers
Miles


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sat Jul 28, 2012 11:22 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
Integer numeric rendering into decimal digits is almost always done in backwards order to a fixed-size temp buffer of size appropriate for the numeric type, then that rendered buffer pointer is sent to the final output. There might be algorithms to do it forward, but I'm not familiar with any of them, and I've dug into a fair number of standard libraries. If you're just doing byte-aligned output in hex, though, that's another story, and is better suited for native 8-bit code than bothering with VM overhead.

Barring the output object dispatch, though, is the rest that indicative of language power? Howabout a malloc next? Linked list manipulations? A 16-bit sort algorithm? Fixed-point math? All of these environments have means to call or inline native routines, so doing what native code already does well doesn't seem like it shows much. However, you should write what I did in your platform for comparison: 16-bit unsigned decimal rendering with no leading zeros (unless the number is zero itself), returning a pointer to the string.

I do have good support for function lookup tables, specifically because I also recognized it as something hard in native code, and takes advantage of lda(zp),y in the implementation instead of relying on regular VM instructions to add indexes slowly:
Code:
  ldmi r0, r1, $14  ; r0 = memory(r1 + $14)
  callp             ; call subroutine at rP


Regarding rendering hex bytes, I've got nybble swap, but my CMP equivalents aren't quite fleshed out yet. A table-based approach is easier for now, but I can jimmy up an add-based version based on just current instructions:

Code:
  byte = r0
  tmp = r1
 
  copy tmp, byte       ; tmp = byte
  nswap                ; nybble swap, to work with high nybble first
  andp #$0f
  decloop tmp, 10, :+  ; decrement by 10, branch if it didn't underflow, which would normally continue the loop
   subp 'a' - '0'      ; the number was less than 10, base it off '0' rather than 'a'
: addp 'a' + 10        ; add back the 10 that decloop took out
  ...call output...

  with byte            ; low nybble, same thing
  andp #$0f
  ...
This could be collapsed into a subroutine call per nybble, of course. Or I could hop into native code (or create an instruction) to convert the low byte of a reg into a 2-byte ASCII word filling the reg. Then it'd be appropriate for 16-bit writes from VM code which seems like a much better idea than working with 8-bit value operations at this layer.

I'll work on finishing this, but two things:
- If convertString returns a pointer to the terminating zero, that's not a pointer to the string anymore.
- "Please note: registers 1 and 2 may not be destroyed (except for routine 1)." is kind of funny given the discussion about stack based systems, which always destroy their arguments. :) In all the languages' low-level calling semantics that I know, parameter values are generally free to be overwritten unless specifically exempted (if that's even possible), and return values often share the same location as input parameters. It really doesn't matter here as there isn't much register pressure, though.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sun Jul 29, 2012 5:03 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
I just added something I was very dumb for not including in the first place: Register stack markers.

Old way:
Code:
grow 3
...
rets 3 ; return + shrink 3

New way:
Code:
mgrow 3  ; mark + grow 3
...
retm     ; return subroutine + return register stack to marker
The markers go onto the CPU stack, so if you use them in a subroutine it will take 3 bytes (return address + rstack marker) instead of 2, but it seems the right place for it. This is faster and shorter than doing math during return/shrink, and saves me the 16 opcodes of the 1-byte parameter-embedded rets instruction. "Well, Duh!" says I. ;)

Growing & shrinking the rstack arbitrarily without marking is still in there, but I replaced the 1-byte embedded grow4 with mgrow4 instead of leaving both in. The non-marker grow is now always a 2-byte instruction with a signed 8-bit parameter. shrinkm is also included to just pop back to a marker.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Mon Jul 30, 2012 5:26 pm 
Offline

Joined: Mon Apr 16, 2012 8:45 pm
Posts: 60
This looks very interesting.
White Flame wrote:
Goals

  • Significantly increase code density over native code for complex data-oriented operations
  • Achieve better speed than other VMs/interpreters (threaded Forth, Sweet16, various BASICs, etc)
  • Good compiler target for high level languages
  • Collect a contributed stable of custom instructions and modifications to the VM
Have you tested code densities yet? I would be interested in how it compares with SWEET16.
Quote:

I'd really appreciate feedback on any aspect you'd care to comment about, and it could use others' testing.

I'm going to move it to a publicly hosted VCS at some point, but for preview it's currently hidden away on my personal site.

Some documentation refer to rD, rA and others. Are these real registers that point into the 16 entry register file or just a shorthand notation? Also if these are real registers, how are they updated? Using with?

RCA 1802 used registers that pointed into its 16 entry register file, sadly the accumulator was treated separately.

Sliding register file was used earlier, in SPARC I believe but was in the end not as good an idea as first imagined. How are you overcoming the problems they faced?


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Mon Jul 30, 2012 5:34 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Alienthe wrote:
This looks very interesting.
(Agreed! I'm watching with interest but haven't had a chance to digest.)


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Tue Jul 31, 2012 3:20 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Alienthe wrote:
Sliding register file was used earlier, in SPARC I believe but was in the end not as good an idea as first imagined. How are you overcoming the problems they faced?

In SPARC, the register file is limited by hardware, so only a limited number of register window shifts was possible. In arbitrary code, it would be tricky to keep track of the number of free register slots. In a VM, this would be easy to trap, and then allocate some extra memory to the extend the register file.


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Fri Aug 03, 2012 7:49 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
I haven't been online for a bit, but I did get the code up on github, with a few more changes in the code & docs:
project: https://github.com/AcheronVM/acheronvm
docs: http://acheronvm.github.com/acheronvm/

Alienthe wrote:
This looks very interesting.
Have you tested code densities yet? I would be interested in how it compares with SWEET16.
Only the basics. The avoidance of having to copying in & out of an accumulator for multi-parameter calculations seems to be a benefit, plus the fact that the "prior" register acts as an accumulator when needed includes SW16's density advantages there. A lot more instructions and composite equivalents can and should still be written to increase density (and speed) further. I do not have post/pre-inc/dec on memory ops yet, but standalone inc/dec and inc2/dec2 instructions on the prior address.

Both single byte ((4-bit-opcode << 4) + 4-bit-param) and double byte (8-bit-opcode, 4-bit-param) encodings are supported (among many other multi-param encodings), to balance the tradeoff between opcode space and instruction size & frequency.

Quote:
Some documentation refer to rD, rA and others. Are these real registers that point into the 16 entry register file or just a shorthand notation? Also if these are real registers, how are they updated? Using with?
The generated instruction set doc has the register legend up top. They are shorthand. "copy rD, rA" just means you can do "copy r3, r9" or whatever with all 16 regs. I don't think the docs say this explicitly, though, so I'll update it.

Quote:
RCA 1802 used registers that pointed into its 16 entry register file, sadly the accumulator was treated separately.
Yes, I tried working with something vaguely similar to its register file, having the "current data register" and "current address register", without the separate acc. Running in software, reducing both the number of instructions dispatched and the number of parameters decoded is important for performance, so using implied registers is a good thing, but that prior attempt got real clunky real fast. Acheron's notion of a single "prior" register I think balances the tradeoffs better, though it can lead to more implied addressing modes for the same basic instruction.

Quote:
Sliding register file was used earlier, in SPARC I believe but was in the end not as good an idea as first imagined. How are you overcoming the problems they faced?
Like Arlet said, it was very fixed. It only slid at 8-register increments, and had the first 8 registers as normal non-windowed static registers. I do better on the former, but completely discarded the latter to avoid special cases in register dereferencing.

The "global variables" page seeks to replace static registers, but hasn't been fleshed out yet. Spillover for large numbers of parameters will depend on how it's used regarding memory allocation, though I recently realized that the 2-byte instruction form with the lone 4-bit parameter can technically address r0-r127, which I will take advantage of in some way.

BigEd wrote:
(Agreed! I'm watching with interest but haven't had a chance to digest.)

The new github link has more introductory information in the docs. Hopefully that'll help digestion. Portions might still be a little underripe. :)

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Fri Oct 04, 2019 8:18 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Here's a talk from VCF Midwest - I think this is our very own White Flame presenting:
AcheronVM: 16-bit code on the 6502, taken too far (48 mins, youtube)

Here's the video description:
Quote:
AcheronVM is a small, customizable 16-bit software CPU for the 6502. It has thrown out traditional models to pursue all 3 competing aspects of density, speed, and power solely from within the context of the 6502's tradeoffs. Notable features include a unique hybrid register model, try/catch/finally support, pointer-offset addressing modes, easy instruction set modifications, and a purely ca65 macro-based implementation. This talk spans its design, implementation, and use.


Here's the repository, announcing an imminent update to match the talk:
https://github.com/AcheronVM/acheronvm


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Fri Oct 04, 2019 1:08 pm 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
This VM is a brilliant piece of - hmm - artwork me think is the most-fitting word.

Thank you for sharing this.

(And TY for the link BigEd.)


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Fri Oct 04, 2019 4:07 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Certainly there is an elegance to the design. Just a shame the github repo is so out of date.

I watched the talk and immediately noticed that the registers don't actually need to be 16 bits each, they're just named in 16-bit increments and the "mgrow" parameter is similarly scaled. At the bytecode level, registers appear to be addressed with byte granularity. So there's nothing fundamentally preventing an extension to support 24-bit addresses, 32-bit integers, fixed-point values of arbitrary size and shape, and/or 48- or 128-bit floating point. Which could be useful…


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Fri Oct 04, 2019 6:52 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
I still haven't pushed my repo, as is still very inconsistent between the docs and the code, and a lot of debugging needs to be done. Of course, I should simply do so and let people poke into it themselves, but I don't like uploading wrong stuff publicly.

But yeah, there's no limits to the possibilities of tweaking this model, and it's explicitly intended to be so. I've been exploring different ISAs, and having a build system with an easily editable ISA is something that I left in. Regarding register width, of course the primary constraint would be the size of zeropage allocated to the registers.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sat Oct 05, 2019 1:57 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
White Flame wrote:
I still haven't pushed my repo, as is still very inconsistent between the docs and the code, and a lot of debugging needs to be done. Of course, I should simply do so and let people poke into it themselves, but I don't like uploading wrong stuff publicly.

Perhaps you could fresh up the page a little and add a big note "work in progress". :)

Roughly 20 years ago I did some investigations to Sweet-16. In order to get the execution times I triggered a timer (6522) then call Sw16, do a single instruction, and return. The empty run (JSR Sw16 / RTN) took 101 cycles I noted. Others like ADD n took 108 cycles (the addition only, not counting the call and return). These times are much bigger than those you have mentioned in your presentation? I'm sure that my timer was clocked with systemclock so the values are clock cycles.

Do you have a cycle count for leaving Acheron and reenter it - in other words: what would be the overhead for "inline assembly" ?


Regards,
Arne


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sat Oct 05, 2019 3:26 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
The cycle times that I posted for Forth and Sweet16 are purely for dispatching the next instruction, not including the instruction execution itself, and not including exiting/entering the environment. Basically, as you're running code in the language, that'd be the time spent between instruction implementations. And I guess it doesn't include the JMP to get back to the main dispatch loop either. With Forth, it also doesn't count DOCOL overhead for secondary words, so yeah actual round-trip cycle times would be much more.

Counting cycles from the source code:
  • 28 cycles to switch from 6502->Acheron, including the 'jsr acheron'
  • 19 cycles to switch from Acheron->6502 (plus the instruction dispatch, which is around 20 cycles depending on which dispatcher is used)
So I'd guess about 70 cycles round-trip. I play it safe in these transitions in terms of storing state, so I didn't think about it being ultra-fast. If you're mode switching, it should be that you're doing a fair amount of work in the other mode that it's worth it.

Certainly a faster mode switch is possible, but it'd have to be more "dangerous" in terms of the called code's handling of registers.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sat Oct 05, 2019 4:33 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Ah OK. That explains the somewhat huge difference. :) The housekeeping (virtual PC and flags) take their time.

The transition from Archeron to 6502 and back I was asking is the penalty one have to pay (or consider) when falling back to native code for speed reasons and being too lazy to add the appropriate word.

I remember that I was missing Boolean operations with Sweet-16. But the round-trip time was that huge that I wasn't confident with the results at all.


Top
 Profile  
Reply with quote  
 Post subject: Re: Announce: Acheron VM
PostPosted: Sat Oct 05, 2019 11:44 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
Yep, so you can see why it's advantageous to adjust the instruction set itself. ~20 cycles to dispatch to custom instructions, each use only takes a byte (plus params), and you can remove existing instructions if you're not using them. Then you can consider what operations are often used and what's seldomly used, and figure out your balance per project.

While the 7-bit 'with' dispatcher supports 128 instructions, you can dispatch on a full 256 as well, but you'd have to deal with the 'with' operation separately.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 65 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: