6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun May 05, 2024 5:39 am

All times are UTC




Post new topic Reply to topic  [ 564 posts ]  Go to page Previous  1 ... 30, 31, 32, 33, 34, 35, 36 ... 38  Next
Author Message
PostPosted: Sat Aug 21, 2021 11:52 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
GARTHWILSON wrote:
BigDumbDinosaur wrote:
That won't work unless you load DB with what was in PB (program bank) at the time of the interrupt. Otherwise, the LDA (1,S),Y instruction will fetch from the data bank that was in context at the time COP was executed. That might not be the bank in which the COP instruction and its signature are located.

Ah, yes, I was forgetting operation in different banks. It can still be done of course, but takes more instructions.

In the new firmware, I will use the COP signature to select the API so I can avoid using the stack to pass parameters. There are some functions, such as reading from or writing to NVRAM, in which all three registers are needed to pass parameters. For example, reading from NVRAM will require that the caller pass a 32-bit pointer to the buffer into which the NVRAM contents, along with a byte count (0-255, which becomes 1-256). So the code to call the read-from-NVRAM API would look something like the following:

Code:
         sep #%00100000        ;8-bit accumulator
         rep #%00010000        ;16-bit index
         lda #<count>          ;bytes to fetch
         ldx !#<buf> & $FFFF   ;buffer address LSW
         ldy !#<buf> >> 16     ;buffer address MSW
         cop #knvrget          ;call NVRAM fetch API

That sort of thing can be buried in a macro:

Code:
getnvram .macro .count,.buf
         sep #%00100000
         rep #%00010000
         lda #.count
         ldx !#.buf & $FFFF
         ldy !#.buf> >> 16
         cop #knvrget
         .endm

...which reduces the call to:

Code:
         getnvram 127,buffer

...to get 128 bytes and store them at buffer. Obviously, I can't do that if one of the registers is expected to pass the API index.

When I write a loadable (from disk) kernel I will use the UNIX-style convention of calling APIs. I don't want to do that in the firmware because of the space that would be taken up with the stack manipulation code. A loadable kernel will have more elbow room for code and since it would be running from RAM, the execution time penalty of dealing with a stack frame will be partially offset by the avoidance of any wait-states.

My plan is to write a set of macros for calling kernel APIs, instead of trying to keep all that assembly language mumbo-jumbo straight. A necessary capability of such macros is that of being able to generate a stack frame from macro parameters, which the follow macro does:

Code:
;   generate stack frame from 1 or more words...
;
pushparm .macro ...            ;arbitrary number of parameters allowed
;
         .if %0                ;if non-zero count...
.i           .set 0            ;initialize parameter index
             .rept %0          ;repeat following code for number of parameters passed
.i               .= .i+1
                 pea #%.i      ;push a parameter
             .endr             ;end of repeated code
             pea #.i           ;push number of words in stack frame
         .else
             .error "error: macro syntax: "+%0$+" parm1[,parm2[,parm3]] ..."
         .endif
         .endm

The leftmost parameter will be at the top of the stack frame, and at the bottom of the stack frame will be a word indicating how many parameters are in the frame. That information can be used by APIs that accept a fixed number of parameters to detect if there is a mismatch and abort with an error, rather than do something silly. In the case of an API that accepts a variable number of parameters, the count word will tell the API what to expect.

The above is meant for use by API macros to build the appropriate stack frame prior to calling the API function. pushparm could also be used in a user application to generate a stack frame.

As pushparm pushes words, the higher-level macro has to handle the case in which an address, which will be a 32-bit value, is being processed. For example, the following macro would read N bytes from an open file descriptor FD and deposit said bytes into buffer BUF:

Code:
         read FD,BUF,N

The macro's code would be:

Code:
;   read data from open file...
;
read     .macro .fd,.buf,.n    ;file descriptor, buffer pointer, bytes to read
         pushparm .fd,.buf >> 16,.buf & $FFFF,.n
;
;   ———————————————————————————————————————————————————————————————
;   above generates a DWORD pointer from the address passed in .buf
;   ———————————————————————————————————————————————————————————————
;
         cop #kread            ;call kernel "read" API
         .endm

This basic pattern would apply to all macros that work with addresses.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Sun Aug 22, 2021 12:16 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 22, 2021 12:06 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
jmthompson wrote:
BigDumbDinosaur wrote:
That is a programming style I prefer to avoid. I don't like in-line data like that, as it interrupts code disassembly. Also, it necessarily forces static assembly of the data block. If the data will not be known until run-time, self-modifying code becomes necessary.

That is a good point, and one I had not considered. The Apple IIGS monitor automatically formats these things properly during disassembly, so back in the day it was never much of an issue. I also thought about it more and decided against this technique for reasons similar to what you mention below.

A monitor could cope with that sort of thing if it can recognize the addresses of kernel calls. Even there, it could be fooled if the disassembly address were to start at an operand instead of an opcode.

Quote:
I started thinking about all my console/serial calls that read or write a single byte using just the accumulator, and realized to make things work like I was thinking I'd need to add an unnecessary four-byte pointer after those calls. That's just a big waste of ROM space.

Plus, if you are wait-stating ROM accesses you are adding to the processing time with using a stack frame or an inline data block. Ideally, a firmware API call should be able accept parameters in the registers and not have to rely on a stack frame. That won't happen if your APIs are too complicated.

Quote:
Loading from storage is definitely a good way to go as it saves on EEPROM/flash write cycles. :) I can't really do that on my current build because I only have 32K of RAM, so for now the BIOS and eventually the OS will sit in ROM, which is also 32K. On my next build though I might do something like that, as it'll have a meg of RAM.

That's why early on I got busy with interfacing SCSI hardware to my POC unit. From a 65C816's perspective, it's nearly unlimited storage.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 03, 2021 6:52 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
BigDumbDinosaur wrote:
Well, POC V1.3 seems to be stable...Meanwhile, I decided to resume the postmortem on POC V1.2. So far, nothing is obvious.

Turns out POC V1.2's problem was due to hinkiness in the MPU socket. Pressing on the SRAM seem to be triggering the problem, leading me to cast a suspicious eye on that part. However, I discovered that pressure on the east side of the MPU would cause things to happen. It appears some of the spring contacts in that side of the socket are not making good contact with the MPU's J-leads. What pressing on the SRAM was doing was flexing things around the MPU socket.

Since V1.3 is running nice and stable, I'm not going to mess with V1.2.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Sun Oct 17, 2021 5:20 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 03, 2021 8:14 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
Have you or anyone else here experimented with contact enhancing products, like Stabilant 22 or similar?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 03, 2021 4:19 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3351
Location: Ontario, Canada
I've used Stabilant 22 once or twice, but frankly don't know if it helped. That's not meant as a criticism. It's just that I wasn't able to do an A-B comparison of the results with and without use of the product.

My mindset was, "I hope this stuff helps... and I'm willing to gamble some money." But I'll never know whether I truly got value, or was simply bolstering my own optimism. The placebo effect truly is a thing... especially after you've already convinced yourself to spend the money! People is funny critters! :roll:

ETA: I should clarify that in a funky-contact scenario I'll always begin with the obvious remedy of cleaning the contacts with contact cleaner spray. But I never did an A-B comparison of contact-cleaner alone versus contact cleaner then Stabilant 22.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 03, 2021 6:41 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
Dr Jefyll wrote:
I've used Stabilant 22 once or twice, but frankly don't know if it helped.

Stabilant 22 sounds like that stuff you dump into your lawnmower's gas tank so the carburetor doesn't gum up over the winter.

Quote:
People is funny critters! :roll:

Speak for yourself, mister! :lol:

Quote:
ETA: I should clarify that in a funky-contact scenario I'll always begin with the obvious remedy of cleaning the contacts with contact cleaner spray.

If I suddenly find myself with nothing to do and feel sufficiently motivated, I'll pry the 816 out of the socket and try cleaning the socket's contacts. However, as I earlier said, the incentive to further monkey with V1.2 is not there, not with V1.3 running so well.

At least we know PLCC sockets can "misbehave."

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 04, 2021 2:46 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
Yes, I did skim over an application note for this stuff. Unlike some other products I've seen over the decades, it doesn't state to be a contact enhancer, but more of a "film" that prevents other contaminants and/or oxidation from forming in the applied area. There's always the possibility of galvanic action between dissimilar metals, but picking the right PLCC socket should help minimize that. Then again, in most cases, I would think that a basic "remove the chip and re-install the chip" would likely resolve any flaky issues for a long while. I have old disk drives where the drive firmware was on a PLCC socketed chip, and the drive never exhibited problems. So I guess I need a reason to use it, versus a reason to doubt the need for it.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 28, 2021 4:50 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
BigDumbDinosaur on Sat 21 Aug 2021 wrote:
I am leaning toward the UNIX/Linux convention of loading the API index into a register and pushing parameters to the stack before making the call.


I had similar leanings. People smarter than us spent longer than us devising these conventions. However, Unix security has been an after-thought, at best. Furthermore, we've repeatedly seen that a Unix programmer's idea of minimalism is on a different scale to a 6502/65816 programmer's idea of minimalism. Despite this, I believe that where BSD calling conventions differ from Linux, BSD is faster. In the faster arrangement, parameters are on stack and register pressure is reduced. I believe it even goes as far as callee-saves which I do not like but would tolerate at module boundaries. (Within one module, callee-saves increases undesirable action-at-a-distance. However, it is preferable to have one calling convention - within modules and between modules - which places correctness above speed. Therefore, it is preferable to have caller-saves everywhere.) I otherwise like the BSD calling convention because it is portable and fairly agnostic to available registers - which may differ according to processor mode. However, we've seen that the wilful mixing of program and data has been an unending source of security flaws. Therefore, it is worthwhile to increase register pressure if it increases security. Specifically, I recommend separate program and data stacks whenever possible.

If you depart from Unix conventions entirely then do not follow Acorn's example. I am utterly repulsed by the Acorn Y:X control block pointer calling convention and I implore not repeating it with 16 bit registers and 32 bit pointers.

BigDumbDinosaur on Wed 18 Aug 2021 wrote:
The index merely has to be doubled and copied to the X-register. The front end would then execute JMP (COPTAB,X) or JSR (COPTAB,X) to select the appropriate function, where COPTAB is a 16-bit vector table.


May I suggest skipping ASL // TAX and merely placing an even value in RegX? Indeed, I have a counter-intuitive suggestion for an operating system calling convention. I suggest having a data stack pointer in RegA and the even function number in RegX. (Not the other way around.) Likewise, return codes may also be even values held in RegX. This has a certain pleasing symmetry which allows operating system and application to rapidly respond to each other's circumstances.

A stack based language on 6502/65816, such as Forth, typically maintains a data stack pointer with RegX. Whether or not Forth is the application language or operating system language, use of such a data stack would allow unrestricted parameters to be passed and returned independently of the call stack. This is ideal for I/O operations. In particular, a separate data stack is sufficient to preserve one control block across multiple system calls. Unfortunately, we also want to use RegX for JMP (abs,X) or similar. This is where the reversal is useful. I suggest a system call invocation sequence of TXA // LDX # // COP // JMP (abs,X) // TAX. At the end of each system call TXA // LDX # // STX DROPPRIV // RTI // JSR (abs,X) // TAX where DROPPRIV is an address strobe to a 74x74 or similar functionality within a CPLD. This temporarily moves the data stack out of the way during context switches and jump table indirection but is otherwise fairly agnostic to the implementation language of the application and operating system.

It may be desirable to set a return error flag. However, it may be more convenient (and symmetrical) for an application to JSR to a dummy routine rather than have an operating system which edits the program stack at the end of every system call. Indeed, this is an example of correctness over speed.

BigDumbDinosaur on Wed 18 Aug 2021 wrote:
One last thing. As COP is an interrupt, it has a useful hardware effect. During the seventh and eight clock cycles of processing COP, the MPU's VPB (vector pull) output will go low. In a system that has privilege levels, the glue logic can react to VPB by relaxing restrictions, mimicking the user/supervisor modes of the Motorola 68K.


Genius. An unused vector becomes an operating system entry point with privilege escalation. I considered a doorbell address for privilege escalation. I didn't consider COP. COP may not be downwardly compatible but I don't think this is much of a consideration for BigDumbDinosaur. Why aren't you using 65816 in native mode already?

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 13, 2021 9:07 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
BigDumbDinosaur wrote:
In the new firmware, I will use the COP signature to select the API so I can avoid using the stack to pass parameters...

Having engaged in a series of code-pounding sessions, I've converted POC V1.3's firmware so BIOS API services are called with software interrupts instead of JSRs. With this change, code running anywhere in address space can access the BIOS API without having to be concerned about cross-bank subroutine calls or knowing anything about a jump table. All the calling program needs to know is the index of the desired API service—defined in an INCLUDE file—and the required parameters.

The basic API service call syntax is:

Code:
         cop <api_num>
         bcs failed

where <api_num> is a 1-based index byte.¹ Any parameters required by the call are passed in the registers and depending on the service requested, data may be returned in the registers. As a general rule, unless a register is used to return data to the caller, entry values will be preserved. A return with carry cleared means the called service successfully completed its task. If carry is set a failure of some kind occurred, the meaning of "failure" depending on the service that was called. In case of failure, a non-zero error code may be returned in the accumulator.

Parameters that are passed to API service calls may be values and/or a pointer. An example of where values would be passed as parameters is in calling the API service that outputs a datum on a serial I/O (SIO) channel:

Code:
         lda #'A'              ;write 'A' to...
         ldx #siochanc         ;SIO channel C
         cop ksioput           ;SIO output service
         bcs error

In the above, the error branch would be taken if the SIO channel index passed in .X were out-of-range for the system, e.g., trying to write to channel E in a system with only four serial ports.

An example of where a pointer would be passed as a parameter is in calling the API service that returns the system date and time (sequential time) to the caller:

Code:
         rep #%00010000        ;16-bit index registers
         ldx !#timbuf & $ffff  ;time buffer pointer LSW
         ldy !#timbuf >> 16    ;time buffer pointer MSW
         cop kstmget           ;get sequential time
         sta centisecs         ;save centiseconds

The call to the time service will copy the current time (a 48-bit integer count of seconds elapsed since the "epoch") to the TIMBUF buffer, which can be anywhere in address space. In order for the STMGET service to "reach" the buffer it needs a 24-bit pointer. However, working with 24-bit quantities is awkward, so a 32-bit pointer is passed as two 16-bit quantities (LSW and MSW) and the MSB of the MSW is ignored. The !# notation in the Kowalski assembler coerces the expression into a 16-bit quantity.

In addition to populating TIMBUF with the current time, STMGET returns centiseconds in the accumulator. So STMGET is an example of a service that returns data through a register as well as into user space.

As COP is a software interrupt, an interrupt handler is a necessary part of the program. There are two parts to the COP handler: a preamble and the API service dispatcher. Here's the preamble copied from the assembler listing file:

Code:
03500  ;icop: COPROCESSOR INTERRUPT SERVICE ROUTINE
03501  ;
03502  00D23F  C2 30         icop     rep #m_setr
03503  00D241  5A                     phy
03504  00D242  DA                     phx
03505  00D243  48                     pha
03506  00D244  0B                     phd
03507  00D245  8B                     phb
03508  00D246  6C 04 01               jmp (ivcop)           ;take COP indirect vector
03509  ;
03510  ;   ———————————————————————————————————————————
03511  ;   IVCOP points to the APISVC service dispatch
03512  ;   function if the vector hasn't been altered.
03513  ;   ———————————————————————————————————————————

Note how the index registers are at the top of the stack frame. That arrangement makes it possible to point direct page at the COP stack frame and thus treat the .X and .Y stack copies as a 24-bit, "long", direct-page pointer. The STMGET service referred to above is one of the API services that uses this mechanism to copy data maintained by the kernel into user space.

Before getting too far along, critical to keeping this scheme from turning into an unmanageable mess is the need to define stack frames so routines that dig into the stack know where to go. The stack frame associated with the above COP handler preamble looks like this:

Code:
02197    0001              cop_dbx  =1                    ;DB
02198    0002              cop_dpx  =cop_dbx+s_mpudbx     ;DP
02199    0004              cop_arx  =cop_dpx+s_mpudpx     ;.A
02200    0006              cop_xrx  =cop_arx+s_word       ;.X
02201    0008              cop_yrx  =cop_xrx+s_word       ;.Y
02202    000A              cop_srx  =cop_yrx+s_word       ;SR
02203    000B              cop_pcx  =cop_srx+s_mpusrx     ;PC
02204    000D              cop_pbx  =cop_pcx+s_mpupcx     ;PB

The above stack frame definition is actually "upside down," since the last item (PB) was the first pushed. That arrangement is convenient for the assembler (and avoids a possible phase error during assembly) but is less convenient for the bloke typing in code (that would be me). However, because I have standardized label names for stack frame elements, the inverted nature of the frame definition is not a problem for me. In the above, definitions such as S_MPUSRX and S_WORD define register sizes in bytes. I never hard-code such numbers, as it's too easy to make a typo that will cause no end of grief, even if the code assembles without error. It's all in INCLUDE files.

As an example of how such a stack frame definition would be useful, if I point direct page at the stack with TSC followed by TCD, I can grab the value that was in the accumulator at the time of the interrupt with LDA COP_ARX. Even if I don't relocate direct page, I can still get that value with LDA COP_ARX,S, with only a slight speed penalty.

The API dispatcher fetches the API index (COP's signature), verifies its range and then uses it as an index into a dispatch table. The 65C816's unique JSR (<table>,X) instruction runs the appropriate "processor" for the requested service. Originally, I planned on using JMP (<table>,X) to dispatch execution but changed my mind after considering the fact that every service processor would have to end with a JMP <somewhere> instruction to return to the dispatcher for error processing and the return to the caller. Some processors have multiple exit points, which was the push I needed to scratch that idea. Treating the service processors as subroutines simplified things, with only a slight penalty associated with the subroutine call and return.

Below is the API dispatcher copied from the assembler listing file:

Code:
02500  ;apisvc: DISPATCH API SERVICE REQUESTS CALLED WITH COP #<api_num>
02501  ;
02502  00D000  C2 1C         apisvc   rep #m_setx|sr_bdm|sr_irq
02503  00D002  E2 20                  sep #m_seta
02504  00D004  3B                     tsc                   ;use DP as...
02505  00D005  5B                     tcd                   ;stack frame pointer
02506  00D006  A5 0D                  lda cop_pbx           ;get caller's bank &...
02507  00D008  A4 0B                  ldy cop_pcx           ;return address
02508  00D00A  48                     pha                   ;set caller's bank as...
02509  00D00B  AB                     plb                   ;temporary data bank
02510  00D00C  88                     dey                   ;point at signature &...
02511  00D00D  B9 00 00               lda mm_ram,y          ;fetch it
02512  00D010  AA                     tax                   ;protect signature
02513  00D011  4B                     phk                   ;select kernel's...
02514  00D012  AB                     plb                   ;data bank
02515  00D013  C2 20                  rep #m_seta
02516  00D015  A9 00 00               lda !#kerneldp        ;re-select kernel's...
02517  00D018  5B                     tcd                   ;direct page
02518  00D019  8A                     txa                   ;recover signature
02519  00D01A  29 FF 00               and !#%11111111       ;index zero?
02520  00D01D  F0 1F                  beq .err              ;yes, error...
02521  ;         
02522  00D01F  3A                     dec                   ;no, zero-align
02523  00D020  C9 0F 00               cmp !#maxapi          ;index a legal value?
02524  00D023  B0 19                  bcs .err              ;no, error
02525  ;
02526  00D025  0A                     asl                   ;yes, convert to service...
02527  00D026  AA                     tax                   ;lookup table offset
02528  00D027  E2 20                  sep #m_seta
02529  00D029  FC E7 E5               jsr (apifntab,x)      ;call API function
02530  00D02C  E2 30                  sep #m_setr           ;8-bit registers
02531  00D02E  B0 12                  bcs .err010           ;processing "failure"
02532  ;
02533  00D030  A3 0A                  lda cop_srx,s         ;entry SR value
02534  00D032  29 FE                  and #sr_car_i         ;clear carry means "OK"
02535  ;
02536  00D034  83 0A         .done    sta cop_srx,s         ;condition carry for exit
02537  00D036  C2 30                  rep #m_setr
02538  00D038  AB                     plb                   ;restore MPU state
02539  00D039  2B                     pld
02540  00D03A  68                     pla
02541  00D03B  FA                     plx
02542  00D03C  7A                     ply
02543  00D03D  40                     rti                   ;return to caller
02544  ;
02545  00D03E  E2 30         .err     sep #m_setr
02546  00D040  A9 FF                  lda #e_apiidx         ;flag API index error
02547  ;
02548  00D042  AA            .err010  tax                   ;any error code?
02549  00D043  F0 02                  beq .err020           ;no
02550  ;
02551  00D045  83 04                  sta cop_arx,s         ;yes, return it
02552  ;
02553  00D047  A3 0A         .err020  lda cop_srx,s         ;entry SR
02554  00D049  09 01                  ora #sr_car           ;set carry means "failure"
02555  00D04B  80 E7                  bra .done             ;return to caller

The instruction on line 02511 takes advantage of the fact that a memory or I/O access that uses a 16-bit (absolute) address will prepend that address with whatever is in DB. Since a previous sequence of instructions loaded DB with the caller's bank, which was automatically pushed when the 816 processed the COP instruction, the effective address will be $cb0000 indexed by .Y (cb represents the caller's bank). As .Y is 16-bits and contains the instruction address that was pushed when COP was executed, minus one, the load is from the byte following COP.

That the API processors are subroutines gives rise to an interesting programming matter. The COP stack frame definition doesn't reflect the stack as seen within an API processor. So there is an API stack frame definition that processors use:

Code:
02234    0001              api_ret  =1                    ;PC (pushed by API dispatch)
02235    0003              api_dbx  =api_ret+s_mpupcx     ;DB
02236    0004              api_dpx  =api_dbx+s_mpudbx     ;DP
02237    0006              api_arx  =api_dpx+s_mpudpx     ;.A
02238    0008              api_xrx  =api_arx+s_word       ;.X
02239    000A              api_yrx  =api_xrx+s_word       ;.Y
02240    000C              api_srx  =api_yrx+s_word       ;SR
02241    000D              api_pcx  =api_srx+s_mpusrx     ;PC (pushed by COP)
02242    000F              api_pbx  =api_pcx+s_mpupcx     ;PB

What follows is an example of how an API processor would make use of the above to "cherry-pick" the stack:

Code:
03437  00D20F  3B            scprint  tsc                   ;use DP as the...
03438  00D210  5B                     tcd                   ;stack frame pointer
03439  00D211  A0 00 00               ldy !#0
03440  ;
03441  00D214  B7 08         .main    lda [api_xrx],y       ;get from string
03442  00D216  F0 07                  beq .done             ;end
03443  ;
03444  00D218  02 0D                  cop #kscput           ;write to console
03445  00D21A  C8                     iny
03446  00D21B  10 F7                  bpl .main             ;get next
03447  ;
03448  00D21D  A9 11                  lda #e_stlen          ;string too long
03449  ;
03450  00D21F  F4 00 00      .done    pea #kerneldp         ;reselect kernel's
03451  00D222  2B                     pld                   ;direct page
03452  00D223  09 00                  ora #0                ;error?
03453  00D225  D0 02                  bne .err              ;yes
03454  ;
03455  00D227  18                     clc                   ;no
03456  00D228  60                     rts
03457  ;
03458  00D229  38            .err     sec
03459  00D22A  60                     rts

The above writes a character string to the console screen. It is called as follows:

Code:
         rep #%00010000        ;16-bit index registers
         ldx !#string & $ffff  ;string pointer LSW
         ldy !#string >> 16    ;string pointer MSW
         cop kscprint          ;print string on console
         bcs error             ;string too long

The SCPRINT processor starts by loading the stack pointer (SP) into the direct page pointer register (DP). Doing so aligns direct page with the API stack frame, which means any references to the stack can treat the stack frame as direct page, which makes the instruction at line 03441 treat the stack copies of .X and .Y as the LSW and MSW of a long pointer. The above is also an example of where one API processor calls another one; this happens at line 03444, which is a call to an API service (SCPUT)that writes to the console. The handy thing about this mechanism is the call to SCPUT preserves SCPRINT's environment on the stack, just as the call to SCPRINT preserved the caller's environment.

The use of the stack copies of .X and .Y as a long pointer is facilitated by the order in which those registers were pushed. That's why the stack frame is the way it is. I don't have to do anything to use them as a pointer except load DP with SP.

Once SCPRINT has completed printing (or has aborted due to a too-long string) it points DP back to the kernel's direct page, conditions carry according to how things went and then returns to the API dispatcher. This basic mechanism is used in all API service processors and takes little code.

Use of COP to call API services is slower than calling API services as subroutines. There is interrupt processing overhead, of course, plus the dispatch code that runs things. However, API calls are not tied to a particular address and can be made from any bank with no special code, something that will become more important as the amount of RAM in the system increases and there is more room to execute processes. Since calls are not made to fixed addresses, a relocation of the firmware will not break applications. Furthermore, if new services are added, programs that were previously written will continue to work, since the indices for the new services would be numerically higher than existing indices.

At some point, I will write a kernel that can be loaded from disk and used to provide filesystem services, task management, etc. The means by which kernel API services will be called will be similar to the above, except I figure on using the UNIX method of pushing parameters to the stack, loading the accumulator with the API index and then executing COP to switch to kernel mode—COP's signature byte will be ignored.

My reasoning is there will be a number of instances in which more parameters will be required than can be passed in the registers alone. While there will be cases in which the required parameters could be passed in the registers alone, I'd be faced with writing API service front-end code that would have to be able to handle both methods. Better to pick a method that will work for all cases and stick with it.

——————————
¹The Kowalski assembler treats the COP signature as an immediate mode value. So the correct syntax in that assembler would be COP #<api_num>.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 24, 2021 11:20 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
Pursuant to my switch of POC V1.3's firmware API services from being called with JSR to being called with COP #<api_num>, I did some fooling around with the API service dispatch code and managed to shrink it. The new code treats the COP stack frame as direct page to manipulate the caller's return address to point to and retrieve the API index (COP's signature). Here's the new and improved code—numbers in parentheses are clock cycles consumed.

Code:
03808  00D238  C2 0C         icopa    rep #sr_bdm|sr_irq    ;decimal mode off, IRQs on  (3)
03809  00D23A  E2 30                  sep #m_setr           ;8-bit registers            (3)
03810  00D23C  3B                     tsc                   ;COP stack frame is...      (2)
03811  00D23D  5B                     tcd                   ;now direct page            (2)
03812  00D23E  A9 03                  lda #sr_zer|sr_car    ;C = 0 & Z = 0 is...        (2)
03813  00D240  14 0A                  trb cop_srx           ;default return             (6)
03814  00D242  C2 20                  rep #m_seta           ;16-bit accumulator         (3)
03815  00D244  C6 0B                  dec cop_pcx           ;point to API index, ...    (8)
03816  00D246  A7 0B                  lda [cop_pcx]         ;get it &...                (8)
03817  00D248  AA                     tax                   ;protect it                 (2)
03818  00D249  E6 0B                  inc cop_pcx           ;point to return address    (8)
03819  00D24B  A9 00 00               lda !#kerneldp        ;set kernel's...            (3)
03820  00D24E  5B                     tcd                   ;direct page                (2)
03821  00D24F  4B                     phk                   ;set kernel's...            (3)
03822  00D250  AB                     plb                   ;working bank               (4)
03823  00D251  8A                     txa                   ;restore API index          (2)
03824  00D252  E2 20                  sep #m_seta           ;8-bit accumulator          (3)
03825  00D254  F0 22                  beq .apierr           ;zero API index not allowed (2 or 3)
03826  ;         
03827  00D256  3A                     dec                   ;no, zero-align             (2)
03828  00D257  C9 15                  cmp #maxapi           ;API index in range?        (2)
03829  00D259  B0 1D                  bcs .apierr           ;no, error                  (2 or 3)
03830  ;
03831  00D25B  0A                     asl A                 ;yes, convert to service... (2)
03832  00D25C  AA                     tax                   ;lookup table offset        (2)
03833  00D25D  C2 10                  rep #m_setx           ;16-bit index               (3)
03834  00D25F  FC 35 E7               jsr (apifntab,x)      ;run API processor          (8)

Ninety-five clock cycles are required to execute the above, as well as COP itself. At POC V1.3's 16 MHz Ø2 clock rate, that works out to 5.94 microseconds. Forty-three percent of that processing time is expended in fetching the API index, 2.56 microseconds. Not included in that timing calculation are the clock cycles used to save the MPU context on the stack at the beginning of the COP interrupt handler.

Unavoidably, stack manipulation is a relatively slow process, so the above isn't the speediest code on the race track. :D However, it works and it allows API services to be called from any bank without having to know anything other than an index number. Also, services are essentially transparent to the caller; unless a service needs to return data to the caller in a register, registers are returned unchanged, with the exception of the status register (SR).

Speaking of SR, the carry bit is used to indicate if the call was "successful" or "failed," where the meaning of "success" and "failure" depends on the API service that was invoked. In addition to carry, the zero bit is conditioned according to what is in .A when the service returns. That is a convenience feature, mostly of value with two calls. After I've played around with this mess for a while, I may remove that feature at some point. It depends on how useful it turns out to be.

The obvious alternative to code such as the above is to use a register to pass the API index. This is something I will do when I write a loadable kernel. There is, of course, a tradeoff, since if a register is encumbered with the API index then it can't be used to pass parameters to the service being called. In the case of the BIOS API, it was possible to pass needed parameters using the registers alone. Desiring to take advantage of that, I elected to pass the API index through COP's signature and endure somewhat slower execution.

The loadable kernel that I have on the "drawing board" will have some services that need multiple 32-bit pointers passed. That isn't possible with three registers, necessitating parameter-passing through the stack. Since it is more economical of code if all services use the same call-and-return methods, all will get their parameters from the stack. With the registers unencumbered, I can pass the API index in .A and return a "failure" code in .A if needed—COP's signature can be anything, since it won't be used.

It's all about tradeoffs. :D That reminds me of an old, wry joke about 19th century railroads and tradeoffs. The three goals of passenger service are speed, comfort and safety—and they are always getting in each other's way.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 25, 2021 3:33 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
Code:
03823  00D251  8A                     txa                   ;restore API index          (2)
03824  00D252  E2 20                  sep #m_seta           ;8-bit accumulator          (3)
03825  00D254  F0 22                  beq .apierr           ;zero API index not allowed (2 or 3)
03826  ;         
03827  00D256  3A                     dec                   ;no, zero-align             (2)
03828  00D257  C9 15                  cmp #maxapi           ;API index in range?        (2)
03829  00D259  B0 1D                  bcs .apierr           ;no, error                  (2 or 3)

It looks like you're copying an 8-bit X into a 16-bit A, then reducing A to 8-bits. Does the TXA automatically clear B, or should you do it after the SEP to prevent the possibility of a non-zero B messing up your BEQ? Better yet, can you eliminate the BEQ completely and catch the resultant $FF with your BCS?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 25, 2021 4:00 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
barrym95838 wrote:
It looks like you're copying an 8-bit X into a 16-bit A, then reducing A to 8-bits.

Correct...it's intentional.

Quote:
Does the TXA automatically clear B...

Yes. The rules for copying dissimilar-sized registers is when an 8-bit register is copied into a 16-bit one, the value being copied is extended to 16-bits with %00000000 in bits 8-15. Hence if the accumulator is set to 16 bits, index registers are set to 8 bits, .A = $AA, .B = $BB, and .X =$12, executing TXA will result in .A = $12 and .B = $00. That being the case, the code I'm using will always produce an 8-bit value in .A and the BEQ test will work as expected.

Quote:
Better yet, can you eliminate the BEQ altogether and catch the resultant $FF with your BCS?

I could. The current version is an exercise of sort in trying different combinations of logic. There is, of course, some efficiency to be gained by eliminating any unnecessary tests, as well as stack operations. So it's still open to study and possible improvement.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 25, 2021 8:49 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
barrym95838 wrote:
Better yet, can you eliminate the BEQ completely and catch the resultant $FF with your BCS?

It being a slow evening and me not being sleepy, I figured I'd play around with this some more. Here's my "improved" code.

Code:
03796  00D22E  C2 30         icop     rep #m_setr           ;16-bit registers           (3) ——
03797  00D230  5A                     phy                   ;save MPU state             (4)   |
03798  00D231  DA                     phx                   ;                           (4)   |
03799  00D232  48                     pha                   ;                           (4)   | —> 28
03800  00D233  0B                     phd                   ;                           (4)   |   1.75
03801  00D234  8B                     phb                   ;                           (4)   |
03802  00D235  6C 04 01               jmp (ivcop)           ;take COP indirect vector   (5) ——
03803  ;
03804  ;   ———————————————————————————————————————————————————————————
03805  ;   COP processing re-enters here if IVCOP hasn't been altered.
03806  ;   ———————————————————————————————————————————————————————————
03807  ;
03808  00D238  C2 1C         icopa    rep #m_setx|sr_bdm|sr_irq ;                       (3) ——
03809  00D23A  E2 20                  sep #m_seta           ;                           (3)   |
03810  00D23C  3B                     tsc                   ;COP stack frame is...      (2)   |
03811  00D23D  5B                     tcd                   ;direct page                (2)   |
03812  00D23E  A9 03                  lda #sr_zer|sr_car    ;C = 0 & Z = 0 is...        (2)   |
03813  00D240  14 0A                  trb cop_srx           ;default return             (6)   |
03814  00D242  A5 0D                  lda cop_pbx           ;get caller's bank &...     (4)   |
03815  00D244  A4 0B                  ldy cop_pcx           ;caller's return address    (5)   |
03816  00D246  48                     pha                   ;set caller's bank as...    (3)   |
03817  00D247  AB                     plb                   ;a temporary data bank      (4)   |
03818  00D248  C2 20                  rep #m_seta           ;16-bit accumulator         (3)   |
03819  00D24A  A9 00 00               lda !#kerneldp        ;return to kernel's...      (3)   |
03820  00D24D  5B                     tcd                   ;direct page                (2)   | —> 80
03821  00D24E  A9 00 00               lda !#0               ;clear .B                   (3)   |   5.00
03822  00D251  E2 20                  sep #m_seta           ;8-bit accumulator          (3)   |
03823  00D253  88                     dey                   ;point at API index &...    (2)   |
03824  00D254  B9 00 00               lda mm_ram,y          ;fetch it                   (4)   |
03825  00D257  4B                     phk                   ;return to...               (3)   |
03826  00D258  AB                     plb                   ;kernel's working bank      (4)   |
03827  00D259  3A                     dec                   ;zero-align API index       (2)   |
03828  00D25A  C9 15                  cmp #maxapi           ;API index in range?        (3)   |
03829  00D25C  B0 11                  bcs .apierr           ;no, error                  (2+)  |
03830  ;
03831  00D25E  0A                     asl                   ;yes, convert to service... (2)   |
03832  00D25F  AA                     tax                   ;lookup table offset        (2)   |
03833  00D260  FC 2C E7               jsr (apifntab,x)      ;run API processor          (8) ——
03834  00D263  E2 30                  sep #m_setr           ;assure 8-bit registers     (3) ——
03835  00D265  B0 0A                  bcs .pxerr            ;processing exception       (2+)  |
03836  ;
03837  00D267  C2 30         .done    rep #m_setr           ;16-bit registers           (3)   |
03838  00D269  AB                     plb                   ;restore MPU state          (4)   |
03839  00D26A  2B                     pld                   ;                           (5)   | —> 34
03840  00D26B  68                     pla                   ;                           (5)   |   2.13
03841  00D26C  FA                     plx                   ;                           (5)   |
03842  00D26D  7A                     ply                   ;                           (5)   |
03843  00D26E  40                     rti                   ;return to caller           (7) ——

As before, the numbers in parentheses to the right are execution times in clock cycles. I also summarized the major code sections and added the equivalent execution time in microseconds for each section, assuming a 16 MHz clock rate. However, take those execution times with a grain of salt. This is all running in ROM, so each step of the way incurs one or more wait-states when the MPU fetches an opcode and operand. So it's not as fast as it might seem.

This new and improved code is slightly longer than the old code, but does execute a little faster. As usual, you can have small code and you can have fast code; just not both—in most cases.

One more thing, as Columbo used to say... Once I get through playing around with this I can take out the indirect jump at the front end of the routine. I can also get rid of the REP instruction at the ICOPA label, since it would be redundant. It's there right now so if something intercepts the IVCOP vector and then continues on with the ROM code, the MPU will be returned to a rational state. Once the indirect jump is gone so will be the need for that REP. I still have to re-enable IRQs, though. That could be done at the top of the routine in the REP #M_SETR instruction by changing it to REP #M_SETR | SR_IRQ, which becomes a two-fer.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 29, 2021 5:15 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
BigDumbDinosaur wrote:
One more thing, as Columbo used to say... Once I get through playing around with this I can take out the indirect jump at the front end of the routine. I can also get rid of the REP instruction at the ICOPA label, since it would be redundant. It's there right now so if something intercepts the IVCOP vector and then continues on with the ROM code, the MPU will be returned to a rational state. Once the indirect jump is gone so will be the need for that REP. I still have to re-enable IRQs, though. That could be done at the top of the routine in the REP #M_SETR instruction by changing it to REP #M_SETR | SR_IRQ, which becomes a two-fer.

I made the above changes to the COP handler and everything seems to be copacetic. The code got smaller and on paper, slightly faster. I'll likely revisit it at some time and perhaps will see something that could be tweaked.

Meanwhile, I ran into a interesting problem involving the transfer of S-records from the Kowalski assembler to POC V1.3. The problem was the result of me trying to be a little too clever on the Linux side of things.

On my file and printer server, which runs Linux and Samba, I have a script running that, in a nutshell, takes the S-record file produced by the Kowalski assembler, hands it over to rsync, which in turn, transfers a copy of the file over to my software development server, also running on Linux. On that server is also running rsync and as anyone familiar with rsync knows, it can be configured to run a script or program once the file has been received.

Accordingly. I have a script that rsync runs after the development server receives the file, that script's job being to cat the S-record copy to the serial port connected to the POC unit and then delete the file. With the S-record loader enable on the POC unit, the program that I assembled on the Windows box running the Kowalski assembler gets loaded into POC memory. So far, so good.

On the file and print server, the file being watched by the script that runs rsync is actually a named pipe. Using a named pipe for this purpose seemed to be handy because an open() operation on one for reading will sleep until data becomes available. So my thing was to sit in an endless while loop catting the pipe. cat, of course, will stall until data appears in the pipe. When the assembler writes out the S-record file, which is actually a symbolic link to the pipe, cat will wake up, dump the data into a temporary file and then rsync will be run to send the file on its way.

This scheme seemed to work fine with smaller S-record files, but then I assembled something quite large and sent it to the POC unit. The POC unit reported it had received a bad record and aborted the load. After determining I could repeat this problem, I decided to dig a little. Lo and behold, I discovered my clever arrangement with the pipe would work fine until I fed it an S-record file greater than 10KB.

I had forgotten that a named pipe can buffer, at most, 10KB of data. So much for that idea!

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Sun Feb 06, 2022 3:34 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 29, 2021 9:02 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1402
Location: Scotland
BigDumbDinosaur wrote:
I had forgotten that a named pipe can buffer, at most, 10KB of data. So much for that idea![/color]


Posix named pipes (aka FIFOs, created using the mkfifo(1) command) are blocking by default, so their buffer size is usually not relevant.

They can be made non-blocking by the program that opens them, then it's up to the sending program to check for buffer overflow (return value from write(2)) or the receiving program checking for a zero-length read.

The default size of a pipe is 4KB for older Linux kernels (2.6) and 64KB for later ones, so since 2.6 hasn't been around for a decade or so, I'd assume that your system has a 64KB buffer size, so I really think the issue is elsewhere. the size would normally be a multiple of the default page size which is 4KB anyway, so 10KB sounds a bit odd..

Possibly the s-record file (pipe) not being fully written to the server before the receiving program wakes up?

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 564 posts ]  Go to page Previous  1 ... 30, 31, 32, 33, 34, 35, 36 ... 38  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: