Page 14 of 24

Posted: Sun Mar 25, 2012 4:13 pm
by Dr Jefyll
Removing the possibility of a wasted cycle on (zp),Y address mode seems a worthy goal. After all, the penalty applies not only for page crossings but also on all (zp),Y write cycles -- page crossing or not!
Arlet wrote:
In order to accommodate external SDRAM better, I was thinking about ways to remove the dummy bus accesses from the core.
Whether or not the (zp),Y issue is dealt with, am I right in saying your core (like the 6502) still exhibits wasted cycles from various other causes? I bet many of them would be tough or impossible to eliminate. As an alternative remedy, maybe you could generate something akin to the 65816's VPA and VDA signals. Then at least an "external" device -- your SDRAM logic! -- would know enough to recognize a throw-away bus cycle, and not devote any unnecessary time to it. The wasted cycle would still exist, but it would run at full speed -- no wait states doing a futile fetch from the SDRAM. Does that make sense, or am I missing something?

Just a suggestion... Keep up the great work, fellas! :o

-- Jeff

Posted: Sun Mar 25, 2012 4:29 pm
by Arlet
Dr Jefyll wrote:
Whether or not the (zp),Y issue is dealt with, am I right in saying your core (like the 6502) still exhibits wasted cycles from various other causes? I bet many of them would be tough or impossible to eliminate.
You are absolutely right. There will still be places where the core will have wasted cycles that currently result in a read on the bus. I don't think they can be avoided, at least not without a huge redesign. My plan was to add an 'OE' (output enable) signal to the core to distinguish a true read from a dummy read.

The problem is that in some places, the core doesn't know whether the cycle is valid or not. The "(zp), y" instruction, is one example. "abs, x" and relative branches are others. I'm hoping that all those speculative cycles can be removed, either by making a cycle non-optional, or by adding some extra logic in the address calculation, so the cycle can be removed. After that has been done, it should be possible to add the OE output. The SDRAM controller would then ignore the wasted reads like you said.

This would also benefit peripheral registers that might get confused by dummy reads to the wrong address.

Posted: Wed Mar 28, 2012 5:51 pm
by BigEd
EEye:
very good to see the 'b' core coming into a final shape. I wonder if it's a good idea to update the head post? The current spec is a bit different from what you set out there. You now have
- the usual 65Org16 basis
- a new set of long-distance shift and rotate opcodes
- a set of 16 accumulators
- new capabilities for transfers and logical/arithmetic operations between accumulators

(I still think it would be better to have X, Y and SP inside the set of 16, rather than outside that set. Then - I think - you can do things like add 8 to SP and put the result in X, for some stack-relative lookup, in a single operation.)

Cheers
Ed

(Arlet: nice work on cutting out the dead cycles - that will help 6502 as well as 65org16 users of course!)

Posted: Wed Mar 28, 2012 6:30 pm
by ElEctric_EyE
BigEd wrote:
... (I still think it would be better to have X, Y and SP inside the set of 16, rather than outside that set. Then - I think - you can do things like add 8 to SP and put the result in X, for some stack-relative lookup, in a single operation.)...
Maybe in due time BigEd. Your insistance on it makes me curious! and your idea would be within the realms of the .b version IMO as well. I'll think on how to implement it. Thanks for your input. :)

The head post does need some tidying up, not too much though...

EDIT: I've got it up to date now, maybe too many changes, but most of it reflects current status of the 65Org16.b core as of 3/28/12....

Posted: Sat Mar 31, 2012 9:58 pm
by ElEctric_EyE
One more addition: INcrementing and DEcrementing an accumulator so it can act as a simple index register. This should be easy to implement. INA and DEA can reside in column $B and the rest of the opcodes can follow this RULE: For opcodes like LDx, TYx, TXx, INx, DEx where the destination reg is the accumulator and no need for accumulator/accumulator transposition, smallx=accumulator A thru Q, and this rule applies:

Code: Select all

16'b00dd_00dd_xxxx_xxxx 

IR[15:14] = 00 
IR[13:12] = dst_reg (A thru Q) 
IR[11:10] = 00 
IR[9:8] = dst_reg (A thru Q)

Posted: Mon Apr 02, 2012 12:50 pm
by ElEctric_EyE
BigEd wrote:
... (I still think it would be better to have X, Y and SP inside the set of 16, rather than outside that set...
Now I begin to see what you mean! I think it is starting to dawn on me. To be able to use all acumulators as a register like X or Y with their addressing modes. This would truly make it a powerful CPU! I think I can do it...

Posted: Mon Apr 02, 2012 1:53 pm
by ElEctric_EyE
I forgot about this:

Code: Select all

ABSX0  : regsel = index_y ? SEL_Y : SEL_X;
The X and Y registers are unique to regsel. Not so easy as I was thinking before I looked at the code again.

At this point, I think I can still do my idea of making a simple register out of the Accumulators. I should be able to test it out today...

Posted: Mon Apr 02, 2012 4:44 pm
by BigEd
Hi EEye
I'm not quite sure if we're on the same page, yet.

I was thinking of the relatively simple idea, that by making X, Y and SP part of your 16-way set of accumulators (which is a very small change to the code) you can make these three registers the targets of your new two-operand instructions:
ADC #5 to B // you already have this
ADC #5 to X // you can't presently do this
ASL C by 3 to D // you already have this
ASL C by 3 to Y // you can't presently do this

For this case, there's no impact at all to the instruction encodings or to the decoding you have to do in the verilog.

It looks like you might be thinking of the more complex idea - which might be even more useful - of allowing some or any of the 16 accumulators to play the part of the X or Y registers in indexed addressing
JMP (location),C

I'm not thinking of this case. In this case you do have to think up some ideas about how to encode the choice of accumulator in the opcode.

Hope that helps.
Ed

Posted: Mon Apr 02, 2012 4:58 pm
by Arlet
The complex idea isn't really that complex. If you replace this:

Code: Select all

ABSX0  : regsel = index_y ? SEL_Y : SEL_X;
by this:

Code: Select all

ABSX0  : regsel = index_reg;
Where 'index_reg' is a suitably defined reg, then it just becomes a matter of assigning the proper value to 'index_reg' during instruction decode.

Posted: Mon Apr 02, 2012 5:11 pm
by BigEd
I suppose my main point is to try to establish that there are two ideas kicking around here - one means changing quite a few lines of source and finding some new encodings, and the other doesn't.

I agree that more flexibility with the index registers is desirable - in fact I think EEye has added that to his headline goals. (But I think it's a quite an extra step... maybe a .c core? It would be good to see a settled final version of this core and this thread!)

Ed

Posted: Mon Apr 02, 2012 6:22 pm
by ElEctric_EyE
Oh, I see where you're coming from now BigEd. There is one small problem with doing that though: The upper bits of the opcode, used for the src_reg or dst_reg, are full. For the <shift,rotate> these would be bits IR[11:8] for Acc's A through D. For the other functions like ADC,SBC, etc. these would be bits IR[15:8] for Acc's A through Q.

EDIT: IR[15:8] not IR[15:18]. typo

Posted: Mon Apr 02, 2012 6:39 pm
by BigEd
In my simple plan, you don't need to change any bits, or need any extra bits. Three of the 16 accumulators become the X Y and SP. Instead of 16+3, as you have now, you just have 16. It's actually simpler (except for the assembler)

Acc15, for example, is the SP. You might choose also to call it Q.

Posted: Mon Apr 02, 2012 7:11 pm
by ElEctric_EyE
I thought you might suggest getting rid of 3 Accumulators!
I will keep the possibility in the back of my mind. Maybe when the final is done, which I would like to be after I add in the INcrement/DEcrement opcodes for the Acc's, someone could do this? I am finding myself busy, busy, busy! Work is picking up as well, so there is less free time.

Posted: Mon Apr 02, 2012 7:22 pm
by BigEd
Understood. In principle, it only changes about 4 lines of code.
Cheers
Ed

Posted: Tue Apr 03, 2012 2:45 am
by ElEctric_EyE
BigEd wrote:
... It would be good to see a settled final version of this core and this thread!)

Ed
Why is this your opinion?