6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 6:10 pm

All times are UTC




Post new topic Reply to topic  [ 104 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Fri Sep 06, 2013 9:43 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
The stock CHOCHI board has the serial port here:
Code:
;*** I/O Locations *******************************
; Picoblaze-style UART on 6502 Playground
UART_BASE       equ   $C008             
UART_STATUS     equ   UART_BASE       
UART_DATA       equ   UART_BASE +1     


To output a character:

Code:
ACIA1_Output    pha                     ;save registers
ACIA1_Out1      lda   UART_STATUS       ;serial port status
                and   #$02              ;is tx buffer empty
                bne   ACIA1_Out1        ;no
                pla                     ;get chr
                sta   UART_DATA         ;put character to Port
                rts                     ;done


To input a character,
Code:
ACIA1_Input     lda   UART_STATUS       ;Serial port status
                and   #$40              ;mask data present bit
                beq   ACIA1_Input       ;no char to get
                lda   UART_DATA         ;get chr
                rts                     ;

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 06, 2013 9:46 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
The first iteration of CHOCHI hardware (pre-update) had a toggle-style LED. Reading it toggled the state, so something like
Code:
        LDA $C000   ;toggle LED


The update changed the $C000 port so that the low bit output sets the LED. Now you have to store a $00 or a $01 to $C000 to set or reset the LED. Reading the port does not return the status of the LED (I did not think it's worth adding another mux and slowing down the system).

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 4:45 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
enso wrote:
(I did not think it's worth adding another mux and slowing down the system).

Because the 6502 core allows 1 cycle for a read, you can avoid muxes in critical path. Suppose you have 3 peripherals (PA, PB,PC), and RAM and ROM. Instead of doing DI = MUX( PA, PB, PC, RAM, ROM) you can do DI = MUX( P, RAM, ROM ), where P is a common peripheral register, and you set P <= MUX( PA, PB, PC ).

And of course, you can use wired or instead of a mux, and make sure all unselected inputs are 0.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 2:39 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
You are right, it should be fine. I've had a hard time bringing the system clock up, so now I am afraid to mess with it too much. I have to rethink the IO anyway.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 2:46 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Advantage of the wired-or method is that you can often use reset inputs. Both BRAMs and regular FFs have special reset inputs that you can use to produce zero output value without using additional LUTs. See www.xilinx.com/support/documentation/wh ... /wp275.pdf


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:15 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
What do you mean by wired OR? I didn't think Spartan3 supports true wired OR. Do you mean just a 4-way OR gate connecting 4 different sources and making sure that the unused ones are set to 0? This is the way I do it now - the data in is a 4-way or gate connecting SRAM, input port, UART and BRAM. I did not want to add another layer of logic, but you are right, there is plenty of time available. And it's not really two layers of logic as both layers are triggered simultaneously. I am just a wussy.

Code:
assign CPU_DI[7:0] = BRAM_DO | UART_DO | SRAM_DO | PORTB_DO;

The only truly 'wired' way I can think of is routing the signals out of the chip, connecting them together and bringing them back in. This board does not have enough IO pins to do that. That reminds me of an amazing demo I saw on the Propeller board. The guy whose name escapes me used the IO pins as a data bus to connect several cores together for a really fast interconnect (the normal way to communicate between cores is to write data in a common RAM and have the other core read it which takes tens of cycles)

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:20 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I mean using the verilog 'wor' keyword. It's only a wired-or in the source code, but it allows you to add multiple sources to the same bus without having to manually add an or-gate. In modern FPGAs it's all synthesized to a regular OR, of course.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:27 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Actually, I am even more concerned about the address decoding logic, which involves wider gates. Right now, I have:
Code:
assign IO_SEL = (CPU_AB[15:8]==8'hC0); // C0xx
assign LED_SEL =      IO_SEL & (CPU_AB[7:3]==5'h00000); // LED         is at C000
assign UART_SEL  =    IO_SEL & (CPU_AB[7:3]==5'b00001); // UART        is at C008
assign PORTA_SEL =    IO_SEL & (CPU_AB[7:3]==5'b00010); // output port is at C010
assign PORTB_SEL =    IO_SEL & (CPU_AB[7:3]==5'b00011); // input  port is at C018

So there is an 8-input LUT to match C0xx, followed by 6-input LUTs for individual selects, before the IO units get their select signal. That seems excessive, but I haven't been able to think of a better way.

Worse yet, I select the BRAM like this:
Code:
assign BRAM_SEL = (CPU_AB[15:11]==5'b11111)|(CPU_AB[15:9]==7'b0000000);

And finally the SRAM for everything else:
Code:
assign SRAM_SEL = ~(IO_SEL | BRAM_SEL);

That does not make me happy. I suppose it would be wiser to dedicate logic to SRAM selection - that may be why 45MHz is the top speed for now.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:38 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Yes, those are wide logic functions, but it's not necessarily a problem. When the CPU reads, you can (must) add a register in the path, so there's only a little bit of logic from address bus -> select decoding -> reset input. Usually that's a shorter path than memory->ALU.
When the CPU writes, usually the data goes into a register, so it's fairly short too.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:45 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Arlet wrote:
I mean using the verilog 'wor' keyword. It's only a wired-or in the source code, but it allows you to add multiple sources to the same bus without having to manually add an or-gate. In modern FPGAs it's all synthesized to a regular OR, of course.

Pardon my ignorance, but I've never used 'wor'. How would my code look with it? The examples online are confusing. I must be missing something. For instance,
Code:
wire a, b, c;
wor x;
assign x = a;
assign x = b;
assign x = c;

seems much less readable then
Code:
wire a, b, c, x;
assign x = a|b|c;

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:48 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Arlet wrote:
Yes, those are wide logic functions, but it's not necessarily a problem. When the CPU reads, you can (must) add a register in the path, so there's only a little bit of logic from address bus -> select decoding -> reset input. Usually that's a shorter path than memory->ALU.
When the CPU writes, usually the data goes into a register, so it's fairly short too.

I guess it doesn't matter too much since the SRAM is always selected in my system, so it is only the input mux that incurs any delay. The SRAM write signal may be more of an issue... No, I just write the SRAM for every CPU write - data written to IO is also written to the SRAM but there is no harm done as it's ignored on read anyway.

Interesting - this creates shadow registers for IO writes. All I have to do is disable output ports on read so that the SRAM is read to get the current values of output ports (solving the LED output status read for one)

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Sat Sep 07, 2013 3:52 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:51 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
But something like this is pretty readable:
Code:
wor [7:0] data_in;

ram ram1( ... .data_in(data_in) ... );
ram ram2( ... .data_in(data_in) ... );
rom rom1( ... .data_in(data_in) ... );


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:56 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Arlet wrote:
But something like this is pretty readable:
Code:
wor [7:0] data_in;
ram ram1( ... .data_in(data_in) ... );
ram ram2( ... .data_in(data_in) ... );
rom rom1( ... .data_in(data_in) ... );

Would this work?
Code:
wor [7:0] CPU_DI;  //assigned to the core's data in bus elsewhere...
ram ram1( ... .data_out(CPU_DI) ... );
ram ram2( ... .data_out(CPU_DI) ... );
rom rom1( ... .data_out(CPU_DI) ... );

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 3:57 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Yes. That's the idea.

And you can add more devices without having to create additional temporary signals, and extending the wide OR.


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 07, 2013 4:00 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Arlet wrote:
Yes. That's the idea. ....

Well, this has been most productive. Learned a new trick, and figured out how to read the current value of the output ports...
Thanks, Arlet!
EDIT: Not having to remember to place new IO units into the mux is a great abstraction (on the other hand, it's harder to tell how many signals are being or'ed...)

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 104 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: