CHOCHI - an inexpensive FPGA board with 128K SRAM

enso · Post by **enso** » Fri Sep 06, 2013 9:43 pm

The stock CHOCHI board has the serial port here:

;*** I/O Locations *******************************
; Picoblaze-style UART on 6502 Playground
UART_BASE       equ   $C008             
UART_STATUS     equ   UART_BASE       
UART_DATA       equ   UART_BASE +1

To output a character:

Code: Select all

ACIA1_Output    pha                     ;save registers
ACIA1_Out1      lda   UART_STATUS       ;serial port status
                and   #$02              ;is tx buffer empty
                bne   ACIA1_Out1        ;no
                pla                     ;get chr
                sta   UART_DATA         ;put character to Port
                rts                     ;done

To input a character,

Code: Select all

ACIA1_Input     lda   UART_STATUS       ;Serial port status
                and   #$40              ;mask data present bit
                beq   ACIA1_Input       ;no char to get
                lda   UART_DATA         ;get chr
                rts                     ;

enso · Post by **enso** » Fri Sep 06, 2013 9:46 pm

The first iteration of CHOCHI hardware (pre-update) had a toggle-style LED. Reading it toggled the state, so something like

Code: Select all

        LDA $C000   ;toggle LED

The update changed the $C000 port so that the low bit output sets the LED. Now you have to store a $00 or a $01 to $C000 to set or reset the LED. Reading the port does not return the status of the LED (I did not think it's worth adding another mux and slowing down the system).

Arlet · Post by **Arlet** » Sat Sep 07, 2013 4:45 am

enso wrote:

(I did not think it's worth adding another mux and slowing down the system).

Because the 6502 core allows 1 cycle for a read, you can avoid muxes in critical path. Suppose you have 3 peripherals (PA, PB,PC), and RAM and ROM. Instead of doing DI = MUX( PA, PB, PC, RAM, ROM) you can do DI = MUX( P, RAM, ROM ), where P is a common peripheral register, and you set P <= MUX( PA, PB, PC ).

And of course, you can use wired or instead of a mux, and make sure all unselected inputs are 0.

enso · Post by **enso** » Sat Sep 07, 2013 2:39 pm

You are right, it should be fine. I've had a hard time bringing the system clock up, so now I am afraid to mess with it too much. I have to rethink the IO anyway.

Arlet · Post by **Arlet** » Sat Sep 07, 2013 2:46 pm

Advantage of the wired-or method is that you can often use reset inputs. Both BRAMs and regular FFs have special reset inputs that you can use to produce zero output value without using additional LUTs. See http://www.xilinx.com/support/documenta ... wp275.pdf‎

enso · Post by **enso** » Sat Sep 07, 2013 3:15 pm

What do you mean by wired OR? I didn't think Spartan3 supports true wired OR. Do you mean just a 4-way OR gate connecting 4 different sources and making sure that the unused ones are set to 0? This is the way I do it now - the data in is a 4-way or gate connecting SRAM, input port, UART and BRAM. I did not want to add another layer of logic, but you are right, there is plenty of time available. And it's not really two layers of logic as both layers are triggered simultaneously. I am just a wussy.

Code: Select all

assign CPU_DI[7:0] = BRAM_DO | UART_DO | SRAM_DO | PORTB_DO;

The only truly 'wired' way I can think of is routing the signals out of the chip, connecting them together and bringing them back in. This board does not have enough IO pins to do that. That reminds me of an amazing demo I saw on the Propeller board. The guy whose name escapes me used the IO pins as a data bus to connect several cores together for a really fast interconnect (the normal way to communicate between cores is to write data in a common RAM and have the other core read it which takes tens of cycles)

Arlet · Post by **Arlet** » Sat Sep 07, 2013 3:20 pm

I mean using the verilog 'wor' keyword. It's only a wired-or in the source code, but it allows you to add multiple sources to the same bus without having to manually add an or-gate. In modern FPGAs it's all synthesized to a regular OR, of course.

enso · Post by **enso** » Sat Sep 07, 2013 3:27 pm

Actually, I am even more concerned about the address decoding logic, which involves wider gates. Right now, I have:

Code: Select all

assign IO_SEL = (CPU_AB[15:8]==8'hC0); // C0xx
assign LED_SEL =      IO_SEL & (CPU_AB[7:3]==5'h00000); // LED         is at C000
assign UART_SEL  =    IO_SEL & (CPU_AB[7:3]==5'b00001); // UART        is at C008
assign PORTA_SEL =    IO_SEL & (CPU_AB[7:3]==5'b00010); // output port is at C010
assign PORTB_SEL =    IO_SEL & (CPU_AB[7:3]==5'b00011); // input  port is at C018

So there is an 8-input LUT to match C0xx, followed by 6-input LUTs for individual selects, before the IO units get their select signal. That seems excessive, but I haven't been able to think of a better way.

Worse yet, I select the BRAM like this:

Code: Select all

assign BRAM_SEL = (CPU_AB[15:11]==5'b11111)|(CPU_AB[15:9]==7'b0000000);

And finally the SRAM for everything else:

Code: Select all

assign SRAM_SEL = ~(IO_SEL | BRAM_SEL);

That does not make me happy. I suppose it would be wiser to dedicate logic to SRAM selection - that may be why 45MHz is the top speed for now.

Arlet · Post by **Arlet** » Sat Sep 07, 2013 3:38 pm

Yes, those are wide logic functions, but it's not necessarily a problem. When the CPU reads, you can (must) add a register in the path, so there's only a little bit of logic from address bus -> select decoding -> reset input. Usually that's a shorter path than memory->ALU.
When the CPU writes, usually the data goes into a register, so it's fairly short too.

enso · Post by **enso** » Sat Sep 07, 2013 3:45 pm

Arlet wrote:

I mean using the verilog 'wor' keyword. It's only a wired-or in the source code, but it allows you to add multiple sources to the same bus without having to manually add an or-gate. In modern FPGAs it's all synthesized to a regular OR, of course.

Pardon my ignorance, but I've never used 'wor'. How would my code look with it? The examples online are confusing. I must be missing something. For instance,

Code: Select all

wire a, b, c;
wor x;
assign x = a;
assign x = b;
assign x = c;

seems much less readable then

Code: Select all

wire a, b, c, x;
assign x = a|b|c;

enso · Post by **enso** » Sat Sep 07, 2013 3:48 pm

Arlet wrote:

Yes, those are wide logic functions, but it's not necessarily a problem. When the CPU reads, you can (must) add a register in the path, so there's only a little bit of logic from address bus -> select decoding -> reset input. Usually that's a shorter path than memory->ALU.
When the CPU writes, usually the data goes into a register, so it's fairly short too.

I guess it doesn't matter too much since the SRAM is always selected in my system, so it is only the input mux that incurs any delay. The SRAM write signal may be more of an issue... No, I just write the SRAM for every CPU write - data written to IO is also written to the SRAM but there is no harm done as it's ignored on read anyway.

Interesting - this creates shadow registers for IO writes. All I have to do is disable output ports on read so that the SRAM is read to get the current values of output ports (solving the LED output status read for one)

Arlet · Post by **Arlet** » Sat Sep 07, 2013 3:51 pm

But something like this is pretty readable:

Code: Select all

wor [7:0] data_in;

ram ram1( ... .data_in(data_in) ... );
ram ram2( ... .data_in(data_in) ... );
rom rom1( ... .data_in(data_in) ... );

enso · Post by **enso** » Sat Sep 07, 2013 3:56 pm

Arlet wrote:

But something like this is pretty readable:

Code: Select all

wor [7:0] data_in;
ram ram1( ... .data_in(data_in) ... );
ram ram2( ... .data_in(data_in) ... );
rom rom1( ... .data_in(data_in) ... );

Would this work?

Code: Select all

wor [7:0] CPU_DI;  //assigned to the core's data in bus elsewhere...
ram ram1( ... .data_out(CPU_DI) ... );
ram ram2( ... .data_out(CPU_DI) ... );
rom rom1( ... .data_out(CPU_DI) ... );

Arlet · Post by **Arlet** » Sat Sep 07, 2013 3:57 pm

Yes. That's the idea.

And you can add more devices without having to create additional temporary signals, and extending the wide OR.

enso · Post by **enso** » Sat Sep 07, 2013 4:00 pm

Arlet wrote:

Yes. That's the idea. ....

Well, this has been most productive. Learned a new trick, and figured out how to read the current value of the output ports...
Thanks, Arlet!
EDIT: Not having to remember to place new IO units into the mux is a great abstraction (on the other hand, it's harder to tell how many signals are being or'ed...)

CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - Programming the serial port

Re: CHOCHI - programming the LED

Re: CHOCHI - programming the LED

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM

Re: CHOCHI - an inexpensive FPGA board with 128K SRAM