Arlet's 6502 Core Timing
Posted: Fri Sep 11, 2015 7:14 pm
I've been playing with Arlet's cute 6502 Verilog core recently and trying to embed it onto an FPGA for a DAQ application to analyze bus signals (the 6502 tells an FPGA-logic analyzer to start collecting data and sends/receives data from a DUT).
To make my life easier, I've tried connecting the 6502 core to a shared wishbone bus with the following memory map:
0x0000-01FFF Uninitialized RAM
0x2000-0x3DFF Initialized RAM/ROM Program (Loaded by either by IPL or FPGA as part of bitstream init)
0x3E00-0x3EFF Wishbone I/O
0x3FF0-0x3FF9 IPL
0x3FFA-0x3FFF Vectors (taking advantage of incomplete address decoding here)
To connect the core to a Wishbone bus, I've tied 6502 RDY to Wishbone ACK, and Wishone STB and CYC to a signal which simply ensures that the cycle after ack is a wait-state for the 6502. While I got the core uploaded to my FPGA, any attempts to access I/O would crash. Through simulation, I've figured out that this is because while the 6502 attempts to decode the high 8 bits of an absolute addressing mode operand, there is a combinatorial path from Data In to the upper part of the Address Bus out (For Arlet specifically: during state ABS1: RDY1 is high and so DIMUX takes the value of DI, and DI feeds back into AB[15:8]). If e.g. I'm accessing I/O using an absolute address mode, the upper 8-bits of the address bus will immediately take the value of the data-in bus without waiting for posedge.
As an example using my address decoding above; assume there are two separate synchronous RAMs at 0x2000-0x3DFF/0x3F00-0x3FFF and 0x3E00-0x3EFF regions. Combinational address decoding ensures that only only RAM is selected at a time, by checking that the top 8 bits of the address bus equal 0xFE or not. Synchronous RAMs will update the read data bus every cycle, regardless of whether they are selected or not- it's up to the CPU to latch the data or ignore it. All the RAM has to do is ACK when accessed to indicated the data is now valid. See More Technical Information.
Now suppose the conditions are met so the address bus is combinationally linked to the input data bus- an absolute addressing mode (read or write doesn't matter). The CPU reads the top 8 bits of an absolute address from the SRAM at 0x2000-0x3DFF, or 0x3F00-0x3FFF. The input data will be fed to the address bus to prepare to either latch new input data/write new output data from/to the absolute addressing's source/destination during the next posedge.
Suppose the absolute address to be accessed is NOT in the same address decoding region as the address used to get the top 8 bits of the address bus. The minute there is an address-decode match, the muxes will switch from one device's read data bus and control signals to another. This of course causes a different device to be accessed, which in turn sends data to the data-in bus without waiting, and things get ugly from here. Simulation in my case confirms a combinatorial loop, but there's no guarantees that the new data that the CPU "sees" will be valid at all!
I'm not sure how to fix this problem/am looking for advice and/or a timing diagram. I suppose the easiest thing to do would be to add a register to the address line to delay changes by one clock, but I'm not sure this will cause improper operation in other aspects; I assume there is a reason the address takes the value of the data bus immediately instead of waiting until a clock transition. Same with adding a register to data in to ensure a one-cycle delay.
From what I understand, Arlet's core effectively divides the input clock by two to simulate activity on both positive and negative transitions of the clock. Have I possibly discovered a bug, or am I just interfacing to other parts of the FPGA improperly? Is there a proper timing diagram reference for Arlet's core?
More Technical Information:
Unfortunately, in a shared wishbone topology, the only thing that traditionally prevents a piece of hardware being accessed, and a CPU "seeing" the data the device intends to send is a combinational mux. Although the data may be invalid, it's legal for wishbone devices to put data on its read bus when it's not accessed. Only a mux between all devices read data buses prevents the CPU from seeing this data. There are topologies where the read data will in fact consistently reach the CPU each cycle! The CPU is not supposed to react to/latch the data until it gets a signal from the device that the data is valid.
EDIT1: RDY=>RDY1
To make my life easier, I've tried connecting the 6502 core to a shared wishbone bus with the following memory map:
0x0000-01FFF Uninitialized RAM
0x2000-0x3DFF Initialized RAM/ROM Program (Loaded by either by IPL or FPGA as part of bitstream init)
0x3E00-0x3EFF Wishbone I/O
0x3FF0-0x3FF9 IPL
0x3FFA-0x3FFF Vectors (taking advantage of incomplete address decoding here)
To connect the core to a Wishbone bus, I've tied 6502 RDY to Wishbone ACK, and Wishone STB and CYC to a signal which simply ensures that the cycle after ack is a wait-state for the 6502. While I got the core uploaded to my FPGA, any attempts to access I/O would crash. Through simulation, I've figured out that this is because while the 6502 attempts to decode the high 8 bits of an absolute addressing mode operand, there is a combinatorial path from Data In to the upper part of the Address Bus out (For Arlet specifically: during state ABS1: RDY1 is high and so DIMUX takes the value of DI, and DI feeds back into AB[15:8]). If e.g. I'm accessing I/O using an absolute address mode, the upper 8-bits of the address bus will immediately take the value of the data-in bus without waiting for posedge.
As an example using my address decoding above; assume there are two separate synchronous RAMs at 0x2000-0x3DFF/0x3F00-0x3FFF and 0x3E00-0x3EFF regions. Combinational address decoding ensures that only only RAM is selected at a time, by checking that the top 8 bits of the address bus equal 0xFE or not. Synchronous RAMs will update the read data bus every cycle, regardless of whether they are selected or not- it's up to the CPU to latch the data or ignore it. All the RAM has to do is ACK when accessed to indicated the data is now valid. See More Technical Information.
Now suppose the conditions are met so the address bus is combinationally linked to the input data bus- an absolute addressing mode (read or write doesn't matter). The CPU reads the top 8 bits of an absolute address from the SRAM at 0x2000-0x3DFF, or 0x3F00-0x3FFF. The input data will be fed to the address bus to prepare to either latch new input data/write new output data from/to the absolute addressing's source/destination during the next posedge.
Suppose the absolute address to be accessed is NOT in the same address decoding region as the address used to get the top 8 bits of the address bus. The minute there is an address-decode match, the muxes will switch from one device's read data bus and control signals to another. This of course causes a different device to be accessed, which in turn sends data to the data-in bus without waiting, and things get ugly from here. Simulation in my case confirms a combinatorial loop, but there's no guarantees that the new data that the CPU "sees" will be valid at all!
I'm not sure how to fix this problem/am looking for advice and/or a timing diagram. I suppose the easiest thing to do would be to add a register to the address line to delay changes by one clock, but I'm not sure this will cause improper operation in other aspects; I assume there is a reason the address takes the value of the data bus immediately instead of waiting until a clock transition. Same with adding a register to data in to ensure a one-cycle delay.
From what I understand, Arlet's core effectively divides the input clock by two to simulate activity on both positive and negative transitions of the clock. Have I possibly discovered a bug, or am I just interfacing to other parts of the FPGA improperly? Is there a proper timing diagram reference for Arlet's core?
More Technical Information:
Unfortunately, in a shared wishbone topology, the only thing that traditionally prevents a piece of hardware being accessed, and a CPU "seeing" the data the device intends to send is a combinational mux. Although the data may be invalid, it's legal for wishbone devices to put data on its read bus when it's not accessed. Only a mux between all devices read data buses prevents the CPU from seeing this data. There are topologies where the read data will in fact consistently reach the CPU each cycle! The CPU is not supposed to react to/latch the data until it gets a signal from the device that the data is valid.
EDIT1: RDY=>RDY1