Vulcan-74 - A 6502 Powered Retro MegaProject

barrym95838 · Post by **barrym95838** » Tue Feb 02, 2016 8:54 am

If Brad decides to implement a link register, then it would be the callee's responsibility to save it somewhere before calling something else. It has been about 30 years since I took a class on IBM 360 assembly language programming, but I think that's how it worked on that architecture (the BALR instruction, IIRC). The PDP-8 had no hardware stack pointer or index registers, and its JMS instruction stored the return address in the first word of the called subroutine. It wasn't a problem as long as recursion and/or re-entrancy were avoided or handled very carefully. In either architecture, you could maintain your own "software" return stack, if you thought you needed it. Many large programs "rolled their own" stacks, while others got by with the default hardware behavior, limited as it was.

Mike B.

GARTHWILSON · Post by **GARTHWILSON** » Tue Feb 02, 2016 9:32 am

Quote:

The PDP-8 had no hardware stack pointer or index registers, and its JMS instruction stored the return address in the first word of the called subroutine. It wasn't a problem as long as recursion and/or re-entrancy were avoided or handled very carefully.

From the bottom of the second page, called "Subroutine return addresses and nesting," of my 6502 stacks treatise:

So... Why the stack? Couldn't the return address be kept by the subroutine itself? Well, not really. I understand some of the the 60's/70's minicomputers did that; but there are too many problems with it:
- For one thing, the subroutine might be in ROM where it can't hold variable data.
- For another, there could be several places to enter the subroutine, and there would be the problem of having the program counter jump over the places where return addresses are stored.
- Then you have the possibility of multiple exit points depending on conditions, and every exit point would have to know where you jumped into the routine, which it can't.
- It would also rule out recursion. [Chapter 15 is on recursion.]
- If it were still possible after all of that, it would still take up more memory than the stack does— so fogetaboutit.

BigEd · Post by **BigEd** » Tue Feb 02, 2016 9:59 am

I think there are two ideas here, other than the familiar 6502 use of the stack:
- the "Wheeler jump" where the return address is saved at the head of the called routine. This was an innovation, because it allows subroutines when stacks didn't exist.
- the link register, where the call/return actions don't use a stack, but the caller and callee can cooperate to maintain a stack.

The ARM too uses a link register. There are three advantages, I think:
- leaf subroutines don't call anything so don't need to push the link register contents
- the call instruction doesn't need to write to memory
- the return instruction can be combined with the restoration of saved registers, using a load-multiple instruction
The first is a performance win: the lowest level of subroutine calls can be fractionally cheaper
The second is a simplification - only stores need to write to memory
The third is a performance win: load-multiple does more work with a single instruction fetch

(Although ARM has some complex instructions, they never take more than 16 or so cycles, which puts an acceptable upper limit on interrupt latency.)

jjv · Post by **jjv** » Tue Feb 02, 2016 10:53 am

It would be nice to have instructions to retrieve/store the return address register, so it would be possible to implement a stack in software.

How about indirect addressing for data? Perhaps add a second working register that can hold a pointer to the data memory? Then there could be instructions to load/save the main working register to the address pointed by the pointer register?

mojo · Post by **mojo** » Tue Feb 02, 2016 11:21 am

Here's a radical idea, that came on because of all the talk about CALLs. Make the CPU the ultimate interrupt handler.

Have every instruction execute in a fixed time. I think some PIC micros are like that, everything is 4 cycles. Helps if you have a nice fast clock, and I'm sure 2 cycles max would be possible with a call register. Anyway, then make interrupt handling super fast.

With some extra hardware you could probably have zero overhead entering the interrupt. Have a complete set of interrupt only registers. Maybe a few sets, for different types of interrupt. Then there is no need to save/restore registers and state. You need to design your ALU to cope with that. Add an interrupt output from the GPU so you can trigger an interrupt on a particular pixel, in time for the CPU to write into the GPU's registers just as it is being output.

ttlworks · Post by **ttlworks** » Tue Feb 02, 2016 12:06 pm

Had a link register (LR) in my TREX CPU project.

LR worked nice so far, the only problem was that an interrupt would corrupt
the contents of LR.

To prevent this, I simply blocked all interrupts for the instruction that
follows a JSR\JMP by hardware...
because this instruction then had to bring the contents of LR to a "safe place".

The nice thing about using a link register is, that you can go with only
one instruction that is used for both: JMP and JSR.
Because it's in the responsibility of the code at the address to be jumped
to save LR, or to discard it.

Nevertheless, I'd suggest to have JSR and JMP in your assembly language
for not losing overview of your code, the assembler would just generate
the same binary instruction word for JSR and JMP...

BTW: another neat trick would be to implement both PC and LR as counters...
and then to swap the control signals for both registers with multiplexers
controlled by a toggle flipflop which triggers after a JMP\JSR instruction.

What I liked about FORTH hardware CPUs:
The idea of executing a JMP\JSR, when the uppermost Bit of an instruction is 0.
This way, the target address to be jumped actually _becomes_ the instruction word.
This, of course, would limit code to the lower half of memory...
but to me it somehow feels that your CPU applications would require
less program memory than data memory...

BigEd · Post by **BigEd** » Tue Feb 02, 2016 12:30 pm

(The ARM has some shadow registers to help interrupt routines go faster, not having to save much state, and that includes the link register.)

ttlworks · Post by **ttlworks** » Tue Feb 02, 2016 1:08 pm

That shadow register concept...
I just remembered the Fido 1100 microcontroller.

Having an interrupt response time of literally nothing would be possible
when using fast RAMs as CPU registers, unfortunately this also includes
PC, LR and the status register.

You may want to use 74283 adders for incrementing PC, then.

The basic idea is to have an individual complete set of CPU registers
for a lot of tasks (since the RAMs tend to be a bit bigger than needed.)

Then to have some circuitry tied to the uppermost address lines of the RAMs
that make the CPU registers which responds to the interrupts...
maybe using 74670 to assign specific tasks to specific interrupts by software.

Of course, the CPU would need an instruction for switching back the "context"
to another task... or to the task that that went stopped...

The bad thing about this concept is, that all the sets of CPU registers
have to show up in memory somewhere so that they can properly be initialized
by software after a hardware reset.

mojo · Post by **mojo** » Tue Feb 02, 2016 1:29 pm

I just like the idea of a CPU/GPU hybrid I guess, because I wasted by youth spamming GPU registers to create interesting effects. Horizontal scaling on the Amiga was just a matter of carefully hammering the scroll register as fast as possible.

Oneironaut · Post by **Oneironaut** » Wed Feb 03, 2016 12:14 am

cbscpe wrote:

Oneironaut wrote:

- The "CALL" instruction automatically captures the address before the 574s (+1) in order to "RETURN".
- There is no "JUMP", just "CALL". If you don't need to RETURN, then just don't.

No nested CALLs I suppose then.

In the current design, you get one "free" RET, and the others have to be saved to regmem (3 cycles).
In a newer design idea, I have a 32K CALL/RET stack. Not sure which I am using yet.

Other design changes have been made as well, but still scratching them out on paper.

Brad

magetoo · Post by **magetoo** » Wed Feb 03, 2016 12:14 pm

Oneironaut wrote:

In the current design, you get one "free" RET, and the others have to be saved to regmem (3 cycles).
In a newer design idea, I have a 32K CALL/RET stack. Not sure which I am using yet.

A link register is more rare, and therefore much cooler. :-)

Pre-decrement and post-increment addressing modes for memory operations (like 68K) perhaps? That would give you explicit stack handling when needed while avoiding the call/return overhead when not. You would need to load, add/sub and store to register memory in the same instruction cycle, but you've done crazier stuff before. You'd get unlimited number of stacks too.

Now that the unsolicited advice portion of this post is over, I instantly thought "it's a DSP!' when seeing those multiple data paths. Neat stuff.

mojo · Post by **mojo** » Wed Feb 03, 2016 1:39 pm

Well, some PICs get buy with an 8 deep call stack... But the biggest down side is that they don't support indirect jumps. Either using a register as a pointer or being able to dump addresses onto the stack is rather nice

Oneironaut · Post by **Oneironaut** » Sat Feb 06, 2016 3:32 pm

No internet for 2 days!... the joys of satellite.
Oh well, on the bright side, the only visible neighbor requires a climb up on the roof with binoculars.
Life is all about trade offs!

Ok, I have almost completed my design on paper now.
If time permits this weekend, I will post the entire plan.
My intention is to wire it all up completely in one shot and see how (if) it works!

I have chosen to drive the design around speed, with chip count a distant second.
So if I can gain a 1% speed increase by adding a 512K SRAM as a call stack, consider it done.

When I look at the many amazing examples of DIY CPUs out there, my first reaction is...
"Wow, that thing actually works with all those hand wired chips!"

The reaction I want is...
"Wow, how can that thing actually outperform and Amiga4000 with all those hand wired chips!!"

Let the fun begin!

Radical Brad

barrym95838 · Post by **barrym95838** » Sat Feb 06, 2016 4:25 pm

Oneironaut wrote:

... I have chosen to drive the design around speed, with chip count a distant second.
So if I can gain a 1% speed increase by adding a 512K SRAM as a call stack, consider it done.

Main-framers would be proud!

Mike B.

Oneironaut · Post by **Oneironaut** » Sat Feb 06, 2016 5:37 pm

barrym95838 wrote:

Oneironaut wrote:

... I have chosen to drive the design around speed, with chip count a distant second.
So if I can gain a 1% speed increase by adding a 512K SRAM as a call stack, consider it done.

Main-framers would be proud!
Mike B.

I am shooting to have the final unit use less power and take up less space than the Eniac though!

Brad

Vulcan-74 - A 6502 Powered Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject

Re: Vulcan-74 - A 6502 Compatible Retro MegaProject