6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 5:30 am

All times are UTC




Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Mon Mar 25, 2024 2:32 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
Hi there,

I am trying to speed up my Micro-PET design from 50MHz base / 12.5MHz CPU clock to 70 MHz base / 17.5MHz CPU clock.
I have a stable read timing now, but the system does not correctly read and process the bank byte in time it seems. With bank clamped to zero, I see correct memory accesses (at least at the initial vector pull and subsequent memory fetches - but not with how I try to use the bank byte.

Currently I try to latch the bank byte, and then derive select signals from it. Latching is not really a good FPGA design, registers are preferred, but registering the bank byte directly at the rising edge of phi2 leads to
a) either all select lines are delayed until after phi2 rises or
b) when selecting the data input vs the latch output depending on phi2 itself, the changing data lines already effect the select lines causing glitches.

Note in this example, phi2 rises with the falling edge of qclk (which is 50/70MHz), so registering the value at the rising edge - half a qclk before phi2 rises - avoids the glitches
Code:
        -----------------------------------------------------------------------
        -- CPU address space analysis
        --

        -- note: simply latching D at rising phi2 does not work,
        -- as in the logical part after the latch, the changing D already
        -- bleeds through, before the result is switched back when bankl is in effect.
        -- Therefore we sample D at half-qclk before the transition of phi2.
        -- This may lead to speed limits in faster designs, but works here.
        BankLatch: process(reset, D, phi2, qclk)
        begin
                if (reset ='1') then
                        bankl <= (others => '0');
                elsif (rising_edge(qclk) and phi2='0') then
                        if (forceb0 = '1') then
                                bankl <= (others => '0');
                        else
                                bankl <= D;
                        end if;
                end if;
        end process;

        bank <= bankl;


Now, this does not seem to be fast enough anymore. Any idea / best practice for this?

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 25, 2024 11:01 pm 
Offline
User avatar

Joined: Tue Feb 28, 2023 11:39 pm
Posts: 257
Location: Texas
Correct me if I'm wrong (as I'm still learning the ins and outs of FPGAs myself), but it seems very odd that an FPGA running at 50MHz or more would be "too slow" for this task.

If the clock rates are 50/12.5MHz and 70/17.5Mhz, I'm presuming you have PHI2 derived from your main clock divided by 4; so it would appear that you are setting D several times before PHI2 goes high again.

At some point in that low phase of PHI2 the 65816 has to start switching those lines over from the multiplexed address bus to the values desired on the data bus; are you sure that 2nd or 3rd main clock cycle isn't picking up on the data pins settling to the new state?


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 25, 2024 11:37 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
I am not sure what the real reason is.

The memory decoding is quite elaborate. I see a flickering in the RAM select line even when trying to read th reset vector. That indicates that the select is not quick enough.

Here is the mapper https://github.com/fachat/csa_ultracpu/ ... perPET.vhd
(Note this is not the main branch)

The relevant select line is vramsel. Outside the mapper (in Top.vhd) the select line is registered when phi2 goes high, so it is not registered in the mapper code.

I am still analyzing. The bank latch is one of the potential issues I identified, but it may not necessarily be the right or the only one...

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 26, 2024 12:30 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8507
Location: Midwestern USA
Yuri wrote:
At some point in that low phase of PHI2 the 65816 has to start switching those lines over from the multiplexed address bus to the values desired on the data bus; are you sure that 2nd or 3rd main clock cycle isn't picking up on the data pins settling to the new state?

Actually, the data bus “turnaround” doesn’t occur until after the rise of Ø2.  The bank bits persist for a few nanoseconds during Ø2 high.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 26, 2024 6:56 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
The original version was something like:

A) there is a register for the bank, taking in the D lines on rising phi2.
B) depending on phi2, the actual value taken for address decoding would be either D itself when phi2 was low, or the latched value when phi2 was high.

When I was using this I saw glitches in the decoding because switching (B) was faster than the register (A), and the CPU was even faster. So changing D was visible on the bank before the register value came through.

At least that was my interpretation. That was with the XC95288XL CPLD, and I kept it. Now I'm at Spartan 6, so maybe I should revisit...

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 26, 2024 9:25 am 
Offline
User avatar

Joined: Tue Feb 28, 2023 11:39 pm
Posts: 257
Location: Texas
BigDumbDinosaur wrote:
Yuri wrote:
At some point in that low phase of PHI2 the 65816 has to start switching those lines over from the multiplexed address bus to the values desired on the data bus; are you sure that 2nd or 3rd main clock cycle isn't picking up on the data pins settling to the new state?

Actually, the data bus “turnaround” doesn’t occur until after the rise of Ø2.  The bank bits persist for a few nanoseconds during Ø2 high.


Yea, saw that when I looked at the datasheet. Still though, I can't help but wonder if the extra qclk pulses during PHI2 being low could have an effect.

I think it would be interesting to see what the behavior might be if you sampled on a clock that was 45 degrees out of phase of PHI2 but had the same period.

Edit:

Did some more digging, looks like if I'm reading your schematics right you're running the 65816 @ 3.3V, the data sheet says that the bank address won't be ready until 40ns after PHI2 goes low. If you are sampling on the rising edge of qclk at 50MHz that'd be 20ns. It also means that you're barely giving the 65816 to setup those bank addresses before PHI2 goes high, at which point it will only hold the value for 5ns after that.

Attachment:
timings.jpg
timings.jpg [ 641.02 KiB | Viewed 3059 times ]


That's my reading of it at any rate. I'm sure someone who actually knows what they're talking about will point out how I'm wrong. :mrgreen:


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 26, 2024 11:23 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
How about you clock both the data bus and the phi2 into flops. Then use this sampled phi2 to drive the mux.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 26, 2024 11:13 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
I might just as well be ignoring the FPGA compiler warning about latches and do this...:

Code:
        BankLatch: process(reset, D, phi2, qclk)
        begin
                if (reset ='1') then
                        bankl <= (others => '0');
                elsif (phi2 = '0') then
                        if (forceb0 = '1') then
                                bankl <= (others => '0');
                        else
                                bankl <= D;
                        end if;
                end if;
        end process;

        bank <= bankl;


So far only suggestions about my code.... any experiences you had yourself?

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 27, 2024 1:21 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
As an update. I went with the actual latch. There might be opportunities for optimization later, but sonfar it works.

After careful examination of the signals I noticed that it can't really be the bank latch anyway - every phi1 phase is the same short intervall no matter if the effective CPU clock is 1 or 17 MHz. So as it runs on 1 MHz reliability, address decoding for 17 MHz should be working just as well as both have a 28ns phi1 phase (phi2 low).

I found the problems more in the data timing and after applying a timing constraint the systemnworks fine under 17MHz. Even when blown with a cold spray or a hot (resoldering) fan

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 09, 2024 3:23 pm 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
I would not use a latch. Latch timing is very difficult to guarantee with an FPGA. I would use a higher frequency clock (eg. 140 MHz) and delay the phi2 clock by seven clock cycles, which would be the same as a phi2 one clock cycle in advance of the normal phi2. Then use the advanced phi2 edge to register the bank select. However,
You could also have the FPGA provide a phi2 clock to the system that is slightly different than the CPU’s phi2 clock so that address decoding can be made to work. For example, with eight clocks per phi2 the phi2 output by the FPGA could be low for five then high for three, delaying the CPU’s clock by an FPGA clock cycle. The bank could then be registered by the rising edge of the CPU’s phi2 while the peripherals would not see a system phi2 for another FPGA clock cycle.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron