6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Jul 03, 2024 3:54 am

All times are UTC




Post new topic Reply to topic  [ 114 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8  Next
Author Message
 Post subject:
PostPosted: Tue Dec 20, 2011 10:29 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
Thanks - overall, you've got a 12, a 6 and some 8-bit shifts, which is worth knowing.

I did have a thought: you could do right shifts with MPY and XBA. For example, getting bits from position $0F00 into position $003c would be
Code:
XBA
LDA #$0400
MPY
XBA
Admittedly, not as quick or easy as
Code:
LSR #6
if we had that, but perhaps better than
Code:
LDX #6
TXD
LSR
LDX #1
TXD

(Left shifts are more obvious, since they are just a multiply.)

For my modifications, it makes the shifter somewhat less attractive, because I don't offer read-modify-write addressing modes. For your approach with the shift distance in the opcode, the shifter is much more valuable.

For your
Code:
  LSR #12
I think you can get there faster using
Code:
  ROL
  ROL
  ROL
  ROL
  ROL
(if you're limited to the present instruction set.)

In any case, it's probably worth writing a macro for multi-bit shifts, so you can use multiple instructions for as long as you have to, and then switch to new opcodes when that becomes possible. And your code becomes more compact and readable.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 20, 2011 12:48 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
...In any case, it's probably worth writing a macro for multi-bit shifts, so you can use multiple instructions for as long as you have to, and then switch to new opcodes when that becomes possible. And your code becomes more compact and readable.

Cheers
Ed

Good idea.

I had an idea a couple days ago. Tell me if it's worth anything...
It would be a cycle counter with programmable start and stop addresses (depending on length of code, 16bit counter should be sufficient). It would be especially useful when comparing the effect of modifying opcodes.
For me, it's a little fuzzy how an internal shift Xtimes can be just as fast as shift 1time. This counter could quantify the effects, I believe.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 20, 2011 12:56 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
To answer your second question, the shifter is a single-cycle barrel shifter. It's huge, but fast. I haven't measured the size, but I'll do so. (Done - see below)

For your first question, yes, a performance counter could be very handy - modern CPUs have them. For something as simple as counting cycles, as we're on FPGA, the simplest thing to do is just add a memory-mapped peripheral which is a counter you can start and stop. Once you add performance counters to the CPU, which is easy, you also need to add ways to set and get them, which is going to be a bit less easy. (Things like counting branches, or taken branches, or JSRs, could be interesting.)

Cheers
Ed

Edit: here's the size:
Quote:
slice counts for Arlet's core (spartan3, 'balanced' synthesis)
8 bit cpu: 247, plus 118 for long distance shifting
16 bit cpu: 360, plus 140
32 bit cpu: 488, plus 268


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 21, 2011 8:39 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Those barrel shifters take some resources! Reminds me of when I was trying to use 16-bit comparators in a CPLD, they also are resource hungry...

On a related note, I'm going to need 2 32bit comparators for the cycle counter. One to toggle the counter on and 1 to toggle the counter off.

BigEd wrote:
For something as simple as counting cycles, as we're on FPGA, the simplest thing to do is just add a memory-mapped peripheral which is a counter you can start and stop...

I'm up abit early, can't sleep...
So as far as bringing the PC out of the cpu, is it as easy as this?:
Code:
module cpu( clk, reset, AB, PC, DI, DO, WE, IRQ, NMI, RDY );

parameter dw = 16;      // data width (8 for 6502, 16 for 65Org16)
parameter aw = 32;      // address width (16 for 6502, 32 for 65Org16)

input clk;               // CPU clock
input reset;            // reset signal
output reg [aw-1:0] AB;   // address bus
input [dw-1:0] DI;      // data in, read bus
output [dw-1:0] DO;       // data out, write bus
output WE;               // write enable
input IRQ;               // interrupt request
input NMI;               // non-maskable interrupt request
input RDY;               // Ready signal. Pauses CPU when RDY=0
output reg  [aw-1:0] PC;// Program Counter


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 21, 2011 3:54 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Yes, it is...
Having just completed a routine that displays a 16 bit hex number, I thought that it wouldn't be to difficult to test the cycle counter idea out without delaying my progress too much.
A simple version of it appears to work. At the very beginning of my program I set an arbitrary beginning and ending address to a point after, and it is reading and displaying different values, depending on the ending address, from a 16bit counter. Whether it's accurate or not is the next step to make it a useful tool. I'll have to see how to get the MSB and LSB values from labels in As65, that way I can precisely set the beginning and ending addresses and see what the value should truly read.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 21, 2011 9:10 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
... (Things like counting branches, or taken branches, or JSRs, could be interesting.)...

Why do you say this?

I can set the beginning and ending addresses, without a branch in between, and I get the correct # of cycles, but if there's a branch in between it doesn't count the correct value. For instance, when I set the beginning address at a JSR PLTCHR and the end address right after, it is not counting as expected. I would have expected hundreds of cycles, but it is consistently returning <50.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 21, 2011 9:25 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
ElEctric_EyE wrote:
BigEd wrote:
... (Things like counting branches, or taken branches, or JSRs, could be interesting.)...

Why do you say this?

Good question. These seemed to me to be the things I might want to count! Now I'm on the spot, and I can't really say why.

Quote:
I can set the beginning and ending addresses, without a branch in between, and I get the correct # of cycles, but if there's a branch in between it doesn't count the correct value. For instance, when I set the beginning address at a JSR PLTCHR and the end address right after, it is not counting as expected. I would have expected hundreds of cycles, but it is consistently returning <50.
I can't think of any particular reason for that. If you add an extra NOP before the RTS, does it add the expected 2 cycles? What if you set the start a couple of addresses before the JSR and the end a couple after? Maybe your comparisons are not quite right.

Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 21, 2011 10:03 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
...Now I'm on the spot, and I can't really say why...Ed

I didn't mean to put you on the spot, just thought I may have missed a major detail... I'll continue checking my design.

Just an aside: In As65 I was able to manually set the MSB start and finish registers to #$FFFF. For the LSB registers I just used CYC1 and CYC2 labels for the begin and end PC in the assembly, and LDA #CYC1 & LDA #CYC2 to store the 16bit LSB address values.

I'll do some more testing...

EDIT (12/22/11): 16bit LSB address values not 32bit


Last edited by ElEctric_EyE on Fri Dec 23, 2011 12:30 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Dec 21, 2011 10:32 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
I think the normal (modern) story is to measure the indeterminate things like cache miss rates, branch mispredicts, memory stalls. For a simple core, I'm not sure what's worth measuring. The amount of time spent in interrupt handlers might be interesting but isn't that easy to determine as we don't have a supervisor state, and RTI can be used for non-interrupt purposes. Cache miss and memory stalls (and RDY stalls) would be interesting if we had that sort of thing going on. Cycles lost to page crossing might be interesting, but I imagine they are usually insignificant. Monitoring the minimum stack pointer could be useful.

Before I forget, I was going to say: there are alternatives to bringing out the PC and hooking up a timer with comparators:
- instrumenting a testbench and measuring in simulation
- hooking up a VIA model and using that to time the interval

They have different pros and cons to your approach.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 22, 2011 9:00 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Well, I'm not going to waste too much more time on it. I just thought it would be neat to see how much time a routine takes, especially a time sensitive one like a graphics plotting routine, without manually adding each opcode time.

Maybe I will spend one more day on it. I think my problem lies in the counter section. I tried simulating it, and the counter output is undefined even though the signals to the counter are as they should be in order for it to count. Right now, I have the outputs of the comparators wire OR'd and that output going to the clock input of a toggle flip flop with the D input tied high. I do get a warning about using combinatorial logic driving a clock... I think I'll try a different style FF, maybe a few in a row to avoid metastability issues, then move on to HESMON if it doesn't work reliably...


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 22, 2011 9:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
yes, undefined values in simulations are a pain. They caused me trouble with my multiplier modification.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Dec 23, 2011 7:50 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Now I got the sim to work on iSIM.
In my situation, the counters had to be reset for a successful simulation. In the real world they are auto reset to their INIT values on power-up. For a successful ISim, there is no such auto-reset of INIT values, so I hooked them up to the main Reset.
I plan to add a latch to the counter output and add a auto reset circuit, so as soon as the comparator reads AddressEndLSB and toggles the counter to shut off, the very next cycle it will save the value of the counter to a 16bitFF, and the next cycle after that will reset the counter.
So theoretically in 3-4 cycles after the last Verilog == comparison, the opcode cycle counter will be ready for its next count. It will take at the very least 3cycles (I'm thinking INC StartLSB)? to store new MSB/LSB 'start OR end' values in one of the 2 comparators for the next comparison, which will give a slight margin for successful measuring. I will be sure to test this as well!

I had dreams today of a dual FF 6502 Core, just dreams though....

Once again I think back to a 32bit 6502 machine, and how this LSB/MSB issue would be potentially nonexistent with this core!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Dec 27, 2011 12:03 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1018
Location: near Heidelberg, Germany
BigEd wrote:
For your
Code:
  LSR #12
I think you can get there faster using
Code:
  ROL
  ROL
  ROL
  ROL
  ROL
(if you're limited to the present instruction set.)


One question about multi-bit-shifts: what would you prefer if you had to: ASL or ROL, resp. LSR or ROR?

In my 65k design I have both versions as multi-bit-shift - but I also have plans to do a "SLY" and "SLX", to multiply Y resp. X with a power of two, to easier compute indexes for addresses (which can be 16, 32, or 64 bit in 65k...) Those would be ASL-types though.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 29, 2011 6:58 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10838
Location: England
For myself, if I could only have either a rotate or a shift then I'd pick a rotate, because I can always perform a mask afterwards. The rotate gives me a free low/high swap. I can do a sign extend in a couple of operations too.

Also, for multi-bit rotates, I think I might exclude the carry. For a short word length and a single-bit rotate it's very useful to include the carry because it allows the construction of multi-word rotate. But for multi-bit rotates I think it gets in the way.

Shifting the index registers could be useful! As you suggest, limited-distance left shifts is probably sufficient.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Dec 29, 2011 11:36 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1018
Location: near Heidelberg, Germany
Thanks Ed!

The carry inclusion/exclusion is interesting, I haven't thought about that!

Looking further into that, it seems that Intel has both, ROR and RCR resp. ROL and RCL, the first without carry, the second with carry.
Similarly the m68k has ROR/ROL and ROXR/ROXL to shift through the extend bit. http://en.wikipedia.org/wiki/Motorola_68000
Hm, I don't see an easy way to actually emulate one with another...

Also what about ASR - arithmetic shift right? This would not shift in zero from the left, but the sign (logical shift left LSL = ASL, both shift in zeros from right).
http://en.wikipedia.org/wiki/Bitwise_operation
This could probably be more easily emulated with LSR and setting the sign with an OR.

Seems I still need to work on those in the 65k design specs...

André


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 114 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: