Completely insane, but... The C in 7F0C makes addresses that end with C (1CC, 1DC in the last example) defective! Changing 7F0C to 7F0E makes ...E addresses defective...
I think I see a reasonablish explanation! Memory writes use the datastack TOS and NOS. James Bowman's J1 processor which my CPU is based on has a stack I've sometimes doubted and other times admired -- it uses a mix of combinational and flopped logic, and NOS is muxed in. Somehow when a memory write is combined with a drop (the usual case), the address and data get physically mixed up, which is why I needed lots of F's in one word and lots of zeros in the other.
I'll just have to remake the stack with TOS and NOS as actual permanent registers.
Actually, the previous cycle DSP modification seems to affect the write in the next cycle. Keeping DSP steady the cycle before fixes the issue for now.
I am still wondering about the exact details of this weirdness, and will look into it some more, just out of curiosity.
P.S. Yes, that was the problem.
UART receiver
Re: UART receiver
Last edited by enso1 on Tue Feb 11, 2025 9:14 pm, edited 2 times in total.
Re: UART receiver
Speak harshly to your processor
And beat it when it sneezes
It only does it to annoy
Because it knows it teases
(Thanks to RL Dodgson)
And beat it when it sneezes
It only does it to annoy
Because it knows it teases
(Thanks to RL Dodgson)
Re: UART receiver
I started this topic about UART receivers, but here is my cleaned up version of the OPC-like transmitter.
Unlike the original, I am using a bit index instead of shifting, which is a little more compact. I am proud
of eliminating the zero-detect mux and using the high bit to detect count going negative, saving a wide
mux. A few other tricks were employed to make it a very compact design.
I haven't had a chance to whump on the receiver yet.
There are a couple of other tricks - using ALU cells to build counters, and at least for idx, using RESET to zero it, avoiding a mux -- the decrementor can feed right back into itself.
On Xilinx I would've used SRL16s to control everything, and probably could fit it into maybe 8 slices? Instead of counting you can phase two mutually prime SRL16s with a single on bit each, with the carry mux checking for when they coincide. A surprising number of longish delays can be generated that way with a single slice.
I do miss Xilinx...
Unlike the original, I am using a bit index instead of shifting, which is a little more compact. I am proud
of eliminating the zero-detect mux and using the high bit to detect count going negative, saving a wide
mux. A few other tricks were employed to make it a very compact design.
I haven't had a chance to whump on the receiver yet.
Code: Select all
/*==============================================================================
UART Sample verilog UARTS on the internet appear to be written by
chatgpt and copied by engineering students. I cannot possibly
express my disgust in the level of what passes for engineering!
These were based on a pretty nice OPC design, and I've been tweaking them.
Current UART-TX is 28 LUTs and 26 registers.
*/
module UartTX1(
input clk,
input [12:0] timebase, // clock / baud rate
output txout, // connect to the output pin
input [7:0] din, // only store when ready is asserted
input load, // after storing, ack so the UART knows
output ready // when 1, uart is ready for data
); // I whittled the frame to 11 bits: a zero
// start bit, a stop bit, and an extra bit
// where idx immediately signals READY.
reg [10:0] frame = 11'b11_11111111_1; // Data is deposited at 8:1, and the entire
reg [3:0] idx = 0; // frame is bit-indexed by idex for xmission.
assign ready = (idx==10) ; // idx gets stuck at 10, outputting high and
assign txout = frame[idx]; // setting the READY pin
reg [12:0] cnt; // only 11 bits used, bit 12 is set when
// counter goes negative. Cheaper than 0 detect.
always @ (posedge clk)
if(ready) begin // only when we are free we can consider a load.
if(load) begin // upon load we fill the frame and reset
frame[10:0] <= {2'b11,din[7:0],1'b0}; // the index, releasing ready and starting
idx <= 0; // the clock. Until then, we are idle,
end // with idx=10 and 1 on the wire.
end
else // if we are transmitting, cnt is decremented until
if(cnt[12]) begin // -1 at which point we reload it and bump the bit
cnt <= timebase; // index for the next bit.
idx <= idx+1; // Note that when it hits 10, ready will stop us at
end // the next clock, with count pretty much full.
else // Notice that when we load a frame, we do not set
cnt <= cnt-1; // the count -- we know it's full (avoiding a mux).
endmodule
// At startup the counter is set to 0 and will immediately flip.
// index is likewise set to zero, -- we will shift out 10 one bits and halt.
On Xilinx I would've used SRL16s to control everything, and probably could fit it into maybe 8 slices? Instead of counting you can phase two mutually prime SRL16s with a single on bit each, with the carry mux checking for when they coincide. A surprising number of longish delays can be generated that way with a single slice.
I do miss Xilinx...