Code for a Job Handler -- Locator

satpro · Post by **satpro** » Fri Jan 11, 2019 9:25 pm

Hey, everyone! As some may know, Garth and I are working on a real 65816 retro computer. Something fun and functional. Nothing yet in the way of concrete, other than a wish list, but we have kept it to ourselves in order to get our ideas right (or sort of AND we hope...).

For this post, I would like to submit some job-handler code. It is the umteen-thousandth iteration in something that has EXACTLY that many iterations. This particular file takes a long jump to a Bank 0 landing point. Once there, the code figures (from a table) which function should execute, sends the caller there, and then safely returns. A big also, it also provides the function with 16 bytes of local data (a la Win64).

I am asking for a little eye from above. For me, this is the latest and has not yet been tested. Outside of stack alignment, can this transition be made faster, and why?

And, while we're at it -- who would like to help us build our retro computer? This one is 65816-based, but I'm a 32/64-bit guy and I've been working for several years on extensions for the '816 in x86/64. What I have right now is perfect for something in a programmable chip. I don't even need to say what Garth brings.

I think we have pissed around enough. Let's build something. Oh! Happy New Year!

We are here, too. https://www.facebook.com/groups/611371042385599/

Rob Finch · Post by **Rob Finch** » Sat Jan 12, 2019 1:02 am

Hi satpro.
Good to see another retro computer in the works. I look forward to following along with it.

This looks good to me. I have a couple of comments.
1) What if the locator code is located in ROM? Could the subroutine jump to a tbl_API be relocated to a ram area? I think this would require an extra JMP, but it would be nice to have the OS code in read-only memory. (or possible to use JSR (JumpAddr,x)? )
2) The function id is likely a static parameter, it does not really need to be passed in the accumulator and could save some code space over having to load the accumulator all the time if it were passed inline. Like:

Code: Select all

JSL	$00FFF0
.db #fnid		; rather than load .acc

It would also free up the accumulator to be used to pass arguments. And make it unnecessary to save / restore the .acc for the function call.

Why not use the BRK instruction to implement OS Calls?

GARTHWILSON · Post by **GARTHWILSON** » Sat Jan 12, 2019 2:02 am

Rob Finch wrote:

1) What if the locator code is located in ROM?

So far we've mostly only talked about the philosophical goals. Delays in getting further have stemmed from several things that had to be completed first, like satpro's shop being built. We do expect to run this thing too fast for ROM though, and instead pre-load RAM before releasing the processor from reset, so even the reset routine is in RAM—or some variation on that theme. People who would enjoy this kind of computer will probably be the same ones who have very strong opinions about what it will have and do; so for this and other reasons, flexibility will be imperative.

satpro · Post by **satpro** » Sat Jan 12, 2019 2:26 am

The other important thing (IMO) is that the call can originate from anywhere, even from the OS in Bank 0. In addition, it is possible to add more calls, say a DLL, to a working system. Of course, there is no security and I would imagine, based on several explores, that security is well beyond the scope of any 65816 code, ever.

satpro · Post by **satpro** » Sat Jan 12, 2019 2:31 am

GARTHWILSON wrote:

Rob Finch wrote:

1) What if the locator code is located in ROM?

So far we've mostly only talked about the philosophical goals. Delays in getting further have stemmed from several things that had to be completed first, like satpro's shop being built. We do expect to run this thing too fast for ROM though, and instead pre-load RAM before releasing the processor from reset, so even the reset routine is in RAM—or some variation on that theme. People who would enjoy this kind of computer will probably be the same ones who have very strong opinions about what it will have and do; so for this and other reasons, flexibility will be imperative.

As usual, Garth says it better! I must say, this a modest start. We're not looking to set the world on fire; rather, we think the world needs a 65816-based computer, in modern times, that will be capable of all the things (minus a few obvious examples) the real world is doing right now. For example, my SuperCPU stuff, which is based almost entirely on what I know from things like DirectX, AND... I'm firstly a 65xxx guy ...it's all based on a bigger system, customized for the SuperCPU. I grew up on the Commodore way.

I just love the 65816! I mean, it's that simple. Garth and I have talking about this, all kinds of things, for years now. WE know it is possible. We even think that the other WDC chips, say 6522, can work.

It's sort of weird talking in public like this, but why not? If the two of us talk alone, maybe get Mike involved, and then WE don't go forward, then what?

Nah. The hardware guys can come forward. The software people can help a great deal. If this post is what it takes, then why NOT?

We're doing this. I truly believe today is a 6502.org day and we, as members dreaming, should be there.

satpro · Post by **satpro** » Sat Jan 12, 2019 3:26 am

Rob Finch wrote:

Hi satpro.
Good to see another retro computer in the works. I look forward to following along with it.

This looks good to me. I have a couple of comments.
1) What if the locator code is located in ROM? Could the subroutine jump to a tbl_API be relocated to a ram area? I think this would require an extra JMP, but it would be nice to have the OS code in read-only memory. (or possible to use JSR (JumpAddr,x)? )
2) The function id is likely a static parameter, it does not really need to be passed in the accumulator and could save some code space over having to load the accumulator all the time if it were passed inline. Like:

Code: Select all

JSL	$00FFF0
.db #fnid		; rather than load .acc

It would also free up the accumulator to be used to pass arguments. And make it unnecessary to save / restore the .acc for the function call.

Why not use the BRK instruction to implement OS Calls?

Hi Rob! As Garth said, ROM is slow. It is just a matter of moving ROM into RAM at OS start. When you have 16 MB to work with and 1/256th of that space is reserved in let's say Bank 0 (or is it?), the cost of downloading ALL those bytes can be accomplished well before the screen dreams of firing up. The Locator originates in ROM and is moved down to the bank that has all the action.

I was the guy who cracked the SuperCPU. Probably not the first, but mine was the first independent tackle. In a nutshell, I figured it out (documented in Commodore Free) and realized in print what those first GENIUSES at CMD did. Unlike VICE and code based on chip behavior, I reprogrammed that beast by tracing one byte at a time, for millions of cycles. That took years. I eliminated literally all the code except for required startup, a reduction of 99+%. That was no small feat.

I am interested in the FunctionID as you mention. My belief is that you cannot get to a function if you don't know which one you want. So, in this system, the caller provides a function number, similar to Apple IIgs and Windows. The function number itself is just a constant, which can be passed through to any further OS down the road. For me during testing, I was able to create macros such as INVOKE, and the function (#) actually WAS a constant, so you could do the whole call painlessly.

The BRK (as well as COP) are wide open. Neither was ever implemented properly in the SuperCPU. COP was a zero, and BRK was converted to Emulation Mode. This was literally the only way they could get the SuperCPU to work with the 8-bit C64. We are talking 16-bit here, right? Personally, I think the IIgs was the absolute worst vehicle EVER for a 65816. Maybe worse than that.

We're thinking at least 16 MB. COP and BRK are on the table.

Rob Finch · Post by **Rob Finch** » Sat Jan 12, 2019 6:33 am

When I mentioned ROM I was also thinking RAM with a write-protect. But reset works ok for me if the OS gets overwritten. I'm used to FPGA ROM which is just as fast as RAM.
I found that self-modified code caused me some grief with my 65'816 machine and an I-cache so I like to try and avoid it. The cache line has to be invalidated on a code change and that negates some of the performance of self-modifying code. If the machine ever gets faster it'll likely have cache memory.

Dare I ask what kind of RAM would be in the machine? 16MB static is a lot. I've had a discussion with another poster on RAM and there doesn't seem to be too many options.

satpro · Post by **satpro** » Sat Jan 12, 2019 7:04 am

Rob Finch wrote:

Dare I ask what kind of RAM would be in the machine? 16MB static is a lot. I've had a discussion with another poster on RAM and there doesn't seem to be too many options.

Given the processor, I think our (maybe more me) idea is to conform to modern memory and work with it. With that said, I think we need your valued input. It's better looking from inside out, no? I like your ideas. This is going to get big. And I don't necessarily mean commercially. We need hands.

FPGA? Why not? Sounds wonderful, although I would not mind cranking the crap out of a real '816.

Rob Finch · Post by **Rob Finch** » Sat Jan 12, 2019 1:39 pm

Quote:

Given the processor, I think our (maybe more me) idea is to conform to modern memory

Deja-vu moment for me. Interfacing regular non-modern DRAM isn’t too bad, but it’s slow. Interfacing to static ram is simpler and faster but expensive. To interface to DDR/DDR2/DDR3 type memory it’s complex. For DDRx I’d recommend using a part that already has a DDR controller available with it or for it. In other words an FPGA device. One possibility is to use a pre-made FPGA board, I suggest because the FPGA, DDR ram are already laid out with power supplies etc. The board would need to have a fair number of I/O’s available (40+) to piggy-back onto the ‘816 board.
The only other thing I can think of is to build a cache interface using static ram, and cache regular slow dram. There might be some chips from the 386 era that could be put to use for this.
This is the slippery slope I fell into a few years ago desiring to get higher performance out of a 6502. I decided to just put everything on the FPGA, it just isn't the same as a retro system though.

Quote:

FPGA? Why not? Sounds wonderful, although I would not mind cranking the crap out of a real '816.

I wonder how fast the '816 would go with cranked out with liquid cooling?

Chromatix · Post by **Chromatix** » Sat Jan 12, 2019 6:08 pm

It doesn't get all that warm as it is, without even a heatsink - so adding liquid cooling wouldn't change much. Ultimately, small .6µm chips like this tend not to be limited by thermals, but by fundamental wire and switching delays.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Jan 12, 2019 10:33 pm

I suggest making all operating system API calls through COP. The method I am entertaining to make firmware API calls in my POC V2 unit is:

Code: Select all

;firmware API invocation template using registers for parameter passing...
;
         lda #parm1            ;parameter, if needed
         ldx #parm2            ;parameter, if needed
         ldy #parm3            ;parameter, if needed
         pea #api_index        ;desired service's index (1, 2, 3, etc.)
         cop #0                ;invoke API (signature byte can be anything)
         bcs error             ;where the meaning of "error" depends on API call

An alternate method to be used would be:

Code: Select all

;firmware API invocation template using the stack for parameter passing...
;
         pea #parm1            ;1st parameter
         pea #parm2            ;parameter
         pea #parm3            ;parameter
         ...
         pea #parmN            ;last parameter
         pea #api_index        ;desired service's index (1, 2, 3, etc.)
         pea #n_parm           ;number of parameters passed
         cop #0                ;invoke API (signature byte can be anything)
         bcs error             ;where the meaning of "error" depends on API call

In both cases, the API backend will take care of stack housekeeping. I also developed a macro to generate the parameter stack frame (Kowalski assembler):

Code: Select all

;STACK FRAME GENERATOR
;
;	syntax: ppsetup parm1,parm2,parm3,...
;
;	————————————————————————————————————————————————————————————————————————
;	This macro pushes any parameters that are passed to it in a way that  is
;	consistent with how a kernel call would be handled.   It is  permissible
;	to invoke it without any arguments,  in which case  only  the  parameter
;	count (zero) would be pushed.   Parameters are not validated, only proc-
;	esssed.  In other words, garbage in, garbage out!
;	————————————————————————————————————————————————————————————————————————
;
ppsetup  .macro ...            ;variable number of args allowed
.ct      .= 0                  ;arg counter
         .rept %0              ;loop for each arg, %0 = number of args
.ct          .= .ct+1          ;bump counter
             .byte $f4         ;PEA # opcode
             .word %.ct        ;16-bit parameter
         .endr                 ;end of loop
         .byte $f4             ;PEA # opcode
         .word .ct             ;parameter count
         .endm

The main value in using COP for API calls is nothing special has to be done to call an API from another bank. All of that is handled for you due to COP being just another interrupt.

One thing I did do was explore using COP's signature byte as the API index. I concluded less code would be required to pick the index off the stack than would be required to go back to the bank from which the API call was made to retrieve the index—the latter would require fiddling around with DB. Using that idea, I also developed a macro for making an API call:

Code: Select all

;FIRMWARE or OS API INVOCATION
;
;	callapi [parm1[,parm2...[parmN,]]] api_index
;
callapi  .macro ...            ;1 or more args allowed
         .if %0                ;if at least 1 arg passed...
.ct          .= 0              ;arg counter
             .rept %0          ;loop for each arg
.ct              .= .ct+1      ;bump counter
                 .byte $f4     ;PEA # opcode
                 .word %.ct    ;16-bit parameter
             .endr             ;end of loop
             .byte $f4         ;PEA # opcode
             .word .ct-1       ;parameter count, doesn't include API index
             cop #0            ;invoke API (signature byte can be anything)
             bcs error         ;where the meaning of "error" depends on API call
         .else
             .error "callapi: missing arguments"
         .endif

In the above, the parameter count will be on the bottom of the stack, and right above it will be the API index. Such an arrangement makes it easy to cherry-pick these two stack frame elements, since they are common to all internal code that is accessible through an API.

In the Kowalski assembler, the .error pseudo-op will halt assembly and emit an error message if callapi is invoked without arguments.

BitWise · Post by **BitWise** » Sun Jan 13, 2019 11:44 am

Personally I think your locator API interface is a 'heavy weight'. It uses a lot of cycles and stack RAM when calling simple functions. Functions that need local stack space can change SP themselves.

Instead of loading target API addresses from tables why not use use an index jump into a table of long jumps e.g.

Code: Select all

 jmp (apiTable,x)
apiTable:
 jml Api0
 jml Api1
 ...

If you make your API function numbers multiples of 4 then you can cut out the shifts.

Using COP decouples the application from the OS as it doesn't need to know where its located and as its an interrupt it makes task switching much easier (if you are running multiple tasks) but again is a little cycle hungry for simple functions.

In UNIX only the low level API functions (e.g. open, read, write, fork, kill, getpid, etc.) are accessed by an interrupting 'syscall' mechanism. The higher level API functions (e.g. fopen, fread, fwrite, printf, etc.) are all just library functions accessed by a subroutine call which makes them much more efficient.

BigEd · Post by **BigEd** » Sun Jan 13, 2019 1:52 pm

I like the idea, or reminder, of the difference between c library and kernel. In a world where programs are linked, or where there's load time linking with dynamic libraries, or even runtime loading of libraries, that could be a good fit.

I like the idea of using COP though, to get around the inter-bank memory reference problem, and for the same reason, using the stack for bulky parameters makes good sense. My inclination though would always be to use the accumulator to select the api call, and use X and Y for the first two parameters. A pity that the COP's operand isn't much use.

drogon · Post by **drogon** » Sun Jan 13, 2019 7:21 pm

Chromatix wrote:

It doesn't get all that warm as it is, without even a heatsink - so adding liquid cooling wouldn't change much. Ultimately, small .6µm chips like this tend not to be limited by thermals, but by fundamental wire and switching delays.

And to add to the "lack of heat" thing, I've been running a WDC 65C02 at 16Mhz for a month or 2 now. It's been on more or less on all the time and is there and "just works" when I use it. It's basically running at ambient. Faster? Who knows - my system is just on stripboard and a double sided PCB and I'm continually amazed it runs at that speed at all. It does have 10nS static RAM though.

-Gordon

drogon · Post by **drogon** » Sun Jan 13, 2019 7:23 pm

satpro wrote:

We are here, too. https://www.facebook.com/groups/611371042385599/

Seems I can't access it without a facebook account. Ah well. Will it be described on another medium?

-Gordon

Code for a Job Handler -- Locator

Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator

Re: Code for a Job Handler -- Locator