6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue May 07, 2024 7:37 am

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: Bxx and JSR/RTS
PostPosted: Sun Apr 01, 2018 4:26 am 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
Is the Bxx offset from the operand or from the address after the operand?
What offset would be used to get to the instruction after the Bxx instruction? Would that be 0 or 1?

And, what is the deal with JSR? The address pushed onto the return-stack is the address of the last byte of the JSR operand, rather than the address after the JSR instruction where we want to go?

I'm writing an assembler, but I don't have another assembler handy so I can't just experiment to figure this out. The documentation on the 65c02 is pretty thin on details such as this.

thanks in advance --- Hugh


Top
 Profile  
Reply with quote  
 Post subject: Re: Bxx and JSR/RTS
PostPosted: Sun Apr 01, 2018 5:16 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1928
Location: Sacramento, CA, USA
Hugh Aguilar wrote:
What offset would be used to get to the instruction after the Bxx instruction? Would that be 0 or 1?

Zero. $FE would cause it to branch to itself.

Quote:
And, what is the deal with JSR? The address pushed onto the return-stack is the address of the last byte of the JSR operand, rather than the address after the JSR instruction where we want to go?

Yeah, that's a bit on the quirky side, right? I can only assume that doing it that way shaved a few transistors, but I could be wrong. I totally understand and support the "carry False = borrow True" idiosyncrasy, but I'm not a huge fan of the post-decrement and pre-increment design decision for hardware stack operations.

Quote:
I'm writing an assembler, but I don't have another assembler handy so I can't just experiment to figure this out. The documentation on the 65c02 is pretty thin on details such as this.

thanks in advance --- Hugh

Have you tried using the Hexdump and Disassemble facilities in easy6502?

Mike B.


Top
 Profile  
Reply with quote  
 Post subject: Re: Bxx and JSR/RTS
PostPosted: Sun Apr 01, 2018 8:41 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10799
Location: England
You'll also find links to online assemblers and disassemblers in the banner of the visual6502 simulation, and more here: http://6502.org/tools/asm/#web

barrym95838 wrote:
Hugh Aguilar wrote:
And, what is the deal with JSR? The address pushed onto the return-stack is the address of the last byte of the JSR operand, rather than the address after the JSR instruction where we want to go?

Yeah, that's a bit on the quirky side, right? I can only assume that doing it that way shaved a few transistors, but I could be wrong. I totally understand and support the "carry False = borrow True" idiosyncrasy, but I'm not a huge fan of the post-decrement and pre-increment design decision for hardware stack operations.


Probably the JSR decision saved a lot of transistors - therefore chip area - therefore cost of manufacture - therefore price to the customer. We're told there was a crunch to get the die size down to a strict target to meet cost. If you consider how hard it is to tile a circular wafer with rectangular chip designs, you can see how a small increment in size might drop a lot of chips off the wafer.

How did it save transistors? Because the PC register holds the PC, and the PC has to be incremented to fetch each byte of the instruction, and yet the PC also needs to be stored to the stack, in two bytes. To store the finally-increment value of the PC to the stack, you'd need to have read the third byte of the JSR. And that means you'd need to be able to store both bytes of the target address somewhere, in order to load that pair of bytes into the PC to fetch the first instruction of the subroutine. Even one extra byte of storage could be costly. So, not only does the 6502 save one byte of storage by writing a slightly earlier value of the PC to the stack, it saves a second byte of storage by saving the first operand byte *in the stack pointer* for a cycle, while the SP value is passing through the ALU to be decremented.

Quote:
Previously:
"Q. When is the stack pointer not the stack pointer?"
A. During JSR, the SP goes off to the ALU to be incremented and the S register is used a temporary register to hold the high byte of the destination.
(Link to a visual6502 simulation)


Top
 Profile  
Reply with quote  
 Post subject: Re: Bxx and JSR/RTS
PostPosted: Sun Apr 01, 2018 9:11 pm 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
barrym95838 wrote:
Hugh Aguilar wrote:
What offset would be used to get to the instruction after the Bxx instruction? Would that be 0 or 1?

Zero. $FE would cause it to branch to itself.

Thanks --- that answers my question!

barrym95838 wrote:
Quote:
I'm writing an assembler, but I don't have another assembler handy so I can't just experiment to figure this out. The documentation on the 65c02 is pretty thin on details such as this.

Have you tried using the Hexdump and Disassemble facilities in easy6502?

Thanks for telling me about that --- if I had known about that, I could have answered my own question.


Top
 Profile  
Reply with quote  
 Post subject: Re: Bxx and JSR/RTS
PostPosted: Sun Apr 01, 2018 9:21 pm 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
BigEd wrote:
barrym95838 wrote:
Hugh Aguilar wrote:
And, what is the deal with JSR? The address pushed onto the return-stack is the address of the last byte of the JSR operand, rather than the address after the JSR instruction where we want to go?

Yeah, that's a bit on the quirky side, right? I can only assume that doing it that way shaved a few transistors, but I could be wrong. I totally understand and support the "carry False = borrow True" idiosyncrasy, but I'm not a huge fan of the post-decrement and pre-increment design decision for hardware stack operations.

Probably the JSR decision saved a lot of transistors - therefore chip area - therefore cost of manufacture - therefore price to the customer. We're told there was a crunch to get the die size down to a strict target to meet cost. If you consider how hard it is to tile a circular wafer with rectangular chip designs, you can see how a small increment in size might drop a lot of chips off the wafer.

How did it save transistors? Because the PC register holds the PC, and the PC has to be incremented to fetch each byte of the instruction, and yet the PC also needs to be stored to the stack, in two bytes. To store the finally-increment value of the PC to the stack, you'd need to have read the third byte of the JSR. And that means you'd need to be able to store both bytes of the target address somewhere, in order to load that pair of bytes into the PC to fetch the first instruction of the subroutine. Even one extra byte of storage could be costly. So, not only does the 6502 save one byte of storage by writing a slightly earlier value of the PC to the stack, it saves a second byte of storage by saving the first operand byte *in the stack pointer* for a cycle, while the SP value is passing through the ALU to be decremented.

I had always wondered about that weird JSR/RTS quirk in the 6502 --- your explanation makes sense.

I doubt that they would have saved the PC somewhere, because storage inside of a chip is hugely expensive --- effectively another register.
More likely, they would have added an extra clock cycle to JSR to decrement PC again after obtaining the next-instruction address (the incremented PC value) that will be pushed onto the return-stack.

As for SBC using ~C as the borrow, rather than C, that is also pretty weird --- I don't know what the purpose of this was.
Is there any other processor that does this?


Top
 Profile  
Reply with quote  
 Post subject: Re: Bxx and JSR/RTS
PostPosted: Mon Apr 02, 2018 12:04 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
The sbc instruction is implemented very simply. r = a + ~m + c. Essentially the implementation saves on an inverter in and and inverter out. The absence of a carry out causes the next operation in an sbc chain to be r = a + ~m + 0. This is the same as r = a - m - 1.

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: Bxx and JSR/RTS
PostPosted: Mon Apr 02, 2018 11:55 pm 
Offline

Joined: Fri Jun 03, 2016 3:42 am
Posts: 158
MichaelM wrote:
The sbc instruction is implemented very simply. r = a + ~m + c. Essentially the implementation saves on an inverter in and and inverter out. The absence of a carry out causes the next operation in an sbc chain to be r = a + ~m + 0. This is the same as r = a - m - 1.

In the TOYF I went with a 6800-style subtract in which CF is the borrow, rather than a 6502-style subtract in which the ~CF is the borrow.
I did this so I could have MOV 0,CF used to prep both ADC and SBC --- I don't have a MOV 1,CF at all --- this is one fewer instruction that I need.

I'm willing to consider any kind of trick for reducing the complexity of the TOYF though --- I want it to fit in a small inexpensive FPGA --- as I said before, cost is often the only criteria that people have for choosing a processor (true when the 6502 was invented in the 1970s, and still true today).

On the PDP-11, JSR put the return-address in a register. This register could then access data compiled after the JSR in the calling function. This worked especially well with Forth DTC in which case that data was Forth threaded-code.
This didn't work on the 6502 because the return-address was on the return-stack rather than in a register, and it was 1- the address after the JSR in the calling function.
The PDP-11 was a pretty good design for supporting DTC Forth --- it was limited to 64KB total though --- this memory limitation was the primary reason why the PDP-11 died out.

Your M65c02A has support for both DTC and ITC with your IP and W registers. This is not a bad design. I would recommend however, that you have a 17-bit address-bus. Memory-access through IP would set the high-bit to 1 --- all other memory-access would set the high-bit to 0 --- this way all Forth threaded-code is in a 64KB bank above the main 64KB bank where the machine-code and application data are. You would likely only need 8KB for machine-code, so that leaves 56KB for application data, which is quite a lot. In your document you say that you are planning on some kind of memory-management scheme in the future --- I don't think this is necessary --- using the simple scheme I described above, you could support very large programs. Forth ITC code is pretty compact, and you can have as much as 64KB available for Forth threaded-code, so you can really have whopping big Forth programs. You would have 56KB of main-bank memory available for application data, which is adequate for pretty big programs.

In some cases, the program does not need 32-bit registers and does not need blazing speed --- the ARM Cortex gets used simply because the program and data are too big to fit in the 64KB that 8-bit processors such as the 65c02 etc. provide. Given the scheme I described above though, the M65c02A could possibly be used instead. :)

An interesting application would be a stenotype machine --- I would expect that your M65c02A would be capable of doing this, using the memory-management scheme I described above. An ordinary 64KB 65c02 might even be capable, as the size of the document being generated isn't all that great, but the 65c02 might be too slow (especially if programmed in Forth) --- the 65c816 would be yet another option.

This might be a moot argument though --- the ARM Cortex is pretty inexpensive --- unless the M65c02A or 65c816 provide a pretty good cost benefit, they would not be considered.
It is not clear to me what your goal with the M65c02A is --- what kind of applications are you expecting to use it for? --- considering that the 65c816 is largely ignored these days, why do you expect your M65c02A to gain traction?

I think 8-bit processors still have a future --- some kind of memory-management scheme is needed though --- the 64KB limitation of the 65c02 is a big problem.
If the 6502 designers had been somewhat more forward-thinking and had provided a 128KB system (possibly code in one 64KB bank and data in another 64KB bank), the 6502 era could have continued much longer than it did --- people switched to MS-DOS primarily because they got to break through the 64KB limitation --- the 4.77 Mhz. 8088 was actually slower in many cases than the 1 Mhz. 6510 used in the Commodore-64. It is not like the 6502 had a shortage of available pins and couldn't support a 17-bit address-bus --- the 6502 used one pin to set the V flag --- that was pretty useless, so this pin could have been used instead as the high-bit of a 17-bit address-bus.


Top
 Profile  
Reply with quote  
 Post subject: Re: Bxx and JSR/RTS
PostPosted: Tue Apr 03, 2018 1:59 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Hugh:

Let's keep this thread for discussions more closely related to the subject. I think a different thread would be better for answering your questions about my M65C02A soft-core processor. There's a thread for the M65C02A Core to which I'll take your questions to keep this thread more on topic.

_________________
Michael A.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: