6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 12:26 pm

All times are UTC




Post new topic Reply to topic  [ 842 posts ]  Go to page Previous  1 ... 23, 24, 25, 26, 27, 28, 29 ... 57  Next
Author Message
PostPosted: Tue Feb 02, 2016 8:54 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
If Brad decides to implement a link register, then it would be the callee's responsibility to save it somewhere before calling something else. It has been about 30 years since I took a class on IBM 360 assembly language programming, but I think that's how it worked on that architecture (the BALR instruction, IIRC). The PDP-8 had no hardware stack pointer or index registers, and its JMS instruction stored the return address in the first word of the called subroutine. It wasn't a problem as long as recursion and/or re-entrancy were avoided or handled very carefully. In either architecture, you could maintain your own "software" return stack, if you thought you needed it. Many large programs "rolled their own" stacks, while others got by with the default hardware behavior, limited as it was.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 9:32 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8545
Location: Southern California
Quote:
The PDP-8 had no hardware stack pointer or index registers, and its JMS instruction stored the return address in the first word of the called subroutine. It wasn't a problem as long as recursion and/or re-entrancy were avoided or handled very carefully.

From the bottom of the second page, called "Subroutine return addresses and nesting," of my 6502 stacks treatise:

    So... Why the stack? Couldn't the return address be kept by the subroutine itself? Well, not really. I understand some of the the 60's/70's minicomputers did that; but there are too many problems with it:
    • For one thing, the subroutine might be in ROM where it can't hold variable data.
    • For another, there could be several places to enter the subroutine, and there would be the problem of having the program counter jump over the places where return addresses are stored.
    • Then you have the possibility of multiple exit points depending on conditions, and every exit point would have to know where you jumped into the routine, which it can't.
    • It would also rule out recursion. [Chapter 15 is on recursion.]
    • If it were still possible after all of that, it would still take up more memory than the stack does— so fogetaboutit.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 9:59 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I think there are two ideas here, other than the familiar 6502 use of the stack:
- the "Wheeler jump" where the return address is saved at the head of the called routine. This was an innovation, because it allows subroutines when stacks didn't exist.
- the link register, where the call/return actions don't use a stack, but the caller and callee can cooperate to maintain a stack.

The ARM too uses a link register. There are three advantages, I think:
- leaf subroutines don't call anything so don't need to push the link register contents
- the call instruction doesn't need to write to memory
- the return instruction can be combined with the restoration of saved registers, using a load-multiple instruction
The first is a performance win: the lowest level of subroutine calls can be fractionally cheaper
The second is a simplification - only stores need to write to memory
The third is a performance win: load-multiple does more work with a single instruction fetch

(Although ARM has some complex instructions, they never take more than 16 or so cycles, which puts an acceptable upper limit on interrupt latency.)


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 10:53 am 
Offline

Joined: Tue Feb 02, 2016 10:36 am
Posts: 1
It would be nice to have instructions to retrieve/store the return address register, so it would be possible to implement a stack in software.

How about indirect addressing for data? Perhaps add a second working register that can hold a pointer to the data memory? Then there could be instructions to load/save the main working register to the address pointed by the pointer register?


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 11:21 am 
Offline

Joined: Fri Nov 27, 2015 10:09 am
Posts: 67
Here's a radical idea, that came on because of all the talk about CALLs. Make the CPU the ultimate interrupt handler.

Have every instruction execute in a fixed time. I think some PIC micros are like that, everything is 4 cycles. Helps if you have a nice fast clock, and I'm sure 2 cycles max would be possible with a call register. Anyway, then make interrupt handling super fast.

With some extra hardware you could probably have zero overhead entering the interrupt. Have a complete set of interrupt only registers. Maybe a few sets, for different types of interrupt. Then there is no need to save/restore registers and state. You need to design your ALU to cope with that. Add an interrupt output from the GPU so you can trigger an interrupt on a particular pixel, in time for the CPU to write into the GPU's registers just as it is being output.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 12:06 pm 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
Had a link register (LR) in my TREX CPU project.

LR worked nice so far, the only problem was that an interrupt would corrupt
the contents of LR.

To prevent this, I simply blocked all interrupts for the instruction that
follows a JSR\JMP by hardware...
because this instruction then had to bring the contents of LR to a "safe place".

The nice thing about using a link register is, that you can go with only
one instruction that is used for both: JMP and JSR.
Because it's in the responsibility of the code at the address to be jumped
to save LR, or to discard it.

Nevertheless, I'd suggest to have JSR and JMP in your assembly language
for not losing overview of your code, the assembler would just generate
the same binary instruction word for JSR and JMP...

BTW: another neat trick would be to implement both PC and LR as counters...
and then to swap the control signals for both registers with multiplexers
controlled by a toggle flipflop which triggers after a JMP\JSR instruction.

What I liked about FORTH hardware CPUs:
The idea of executing a JMP\JSR, when the uppermost Bit of an instruction is 0.
This way, the target address to be jumped actually _becomes_ the instruction word.
This, of course, would limit code to the lower half of memory...
but to me it somehow feels that your CPU applications would require
less program memory than data memory...


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 12:30 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
(The ARM has some shadow registers to help interrupt routines go faster, not having to save much state, and that includes the link register.)


Last edited by BigEd on Tue Feb 02, 2016 1:24 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 1:08 pm 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
That shadow register concept...
I just remembered the Fido 1100 microcontroller.

Having an interrupt response time of literally nothing would be possible
when using fast RAMs as CPU registers, unfortunately this also includes
PC, LR and the status register.

You may want to use 74283 adders for incrementing PC, then.


The basic idea is to have an individual complete set of CPU registers
for a lot of tasks (since the RAMs tend to be a bit bigger than needed.)

Then to have some circuitry tied to the uppermost address lines of the RAMs
that make the CPU registers which responds to the interrupts...
maybe using 74670 to assign specific tasks to specific interrupts by software.

Of course, the CPU would need an instruction for switching back the "context"
to another task... or to the task that that went stopped...


The bad thing about this concept is, that all the sets of CPU registers
have to show up in memory somewhere so that they can properly be initialized
by software after a hardware reset.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 02, 2016 1:29 pm 
Offline

Joined: Fri Nov 27, 2015 10:09 am
Posts: 67
I just like the idea of a CPU/GPU hybrid I guess, because I wasted by youth spamming GPU registers to create interesting effects. Horizontal scaling on the Amiga was just a matter of carefully hammering the scroll register as fast as possible.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 03, 2016 12:14 am 
Offline
User avatar

Joined: Mon May 25, 2015 2:25 pm
Posts: 690
Location: Gillies, Ontario, Canada
cbscpe wrote:
Oneironaut wrote:
- The "CALL" instruction automatically captures the address before the 574s (+1) in order to "RETURN".
- There is no "JUMP", just "CALL". If you don't need to RETURN, then just don't.

No nested CALLs I suppose then.


In the current design, you get one "free" RET, and the others have to be saved to regmem (3 cycles).
In a newer design idea, I have a 32K CALL/RET stack. Not sure which I am using yet.

Other design changes have been made as well, but still scratching them out on paper.

Brad


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 03, 2016 12:14 pm 
Offline

Joined: Thu Jun 04, 2015 3:43 pm
Posts: 42
Oneironaut wrote:
In the current design, you get one "free" RET, and the others have to be saved to regmem (3 cycles).
In a newer design idea, I have a 32K CALL/RET stack. Not sure which I am using yet.

A link register is more rare, and therefore much cooler. :-)

Pre-decrement and post-increment addressing modes for memory operations (like 68K) perhaps? That would give you explicit stack handling when needed while avoiding the call/return overhead when not. You would need to load, add/sub and store to register memory in the same instruction cycle, but you've done crazier stuff before. You'd get unlimited number of stacks too.

Now that the unsolicited advice portion of this post is over, I instantly thought "it's a DSP!' when seeing those multiple data paths. Neat stuff.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 03, 2016 1:39 pm 
Offline

Joined: Fri Nov 27, 2015 10:09 am
Posts: 67
Well, some PICs get buy with an 8 deep call stack... But the biggest down side is that they don't support indirect jumps. Either using a register as a pointer or being able to dump addresses onto the stack is rather nice :-)


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 06, 2016 3:32 pm 
Offline
User avatar

Joined: Mon May 25, 2015 2:25 pm
Posts: 690
Location: Gillies, Ontario, Canada
No internet for 2 days!... the joys of satellite.
Oh well, on the bright side, the only visible neighbor requires a climb up on the roof with binoculars.
Life is all about trade offs!

Ok, I have almost completed my design on paper now.
If time permits this weekend, I will post the entire plan.
My intention is to wire it all up completely in one shot and see how (if) it works!

I have chosen to drive the design around speed, with chip count a distant second.
So if I can gain a 1% speed increase by adding a 512K SRAM as a call stack, consider it done.

When I look at the many amazing examples of DIY CPUs out there, my first reaction is...
"Wow, that thing actually works with all those hand wired chips!"

The reaction I want is...
"Wow, how can that thing actually outperform and Amiga4000 with all those hand wired chips!!"

Let the fun begin!

Radical Brad


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 06, 2016 4:25 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Oneironaut wrote:
... I have chosen to drive the design around speed, with chip count a distant second.
So if I can gain a 1% speed increase by adding a 512K SRAM as a call stack, consider it done.

Main-framers would be proud!

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 06, 2016 5:37 pm 
Offline
User avatar

Joined: Mon May 25, 2015 2:25 pm
Posts: 690
Location: Gillies, Ontario, Canada
barrym95838 wrote:
Oneironaut wrote:
... I have chosen to drive the design around speed, with chip count a distant second.
So if I can gain a 1% speed increase by adding a 512K SRAM as a call stack, consider it done.

Main-framers would be proud!
Mike B.


I am shooting to have the final unit use less power and take up less space than the Eniac though!

Brad


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 842 posts ]  Go to page Previous  1 ... 23, 24, 25, 26, 27, 28, 29 ... 57  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: