6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 16, 2024 7:38 pm

All times are UTC




Post new topic Reply to topic  [ 55 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: Fri Jun 28, 2013 10:14 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
I've been making progress with my 6502 Playground board conceived in topic http://forum.6502.org/viewtopic.php?f=4&t=2322. It's a board with XC3S700AN, a 128K SRAM, a 65C02 or '16, a 60MHz crystal oscillator and voltage regulaters, plus lots of IO.
Attachment:
6502play.c.jpg
6502play.c.jpg [ 53.9 KiB | Viewed 2158 times ]

While similar projects have been documented here, I wanted to configure the board with Arlet's core to get the kinks out.

Getting the core to work with a BRAM is pretty easy. The SRAM is a little more work. There has been some discussion about using async SRAMS. The problem is that back-to-back writes required for pushing data onto the stack leave the Write Enable line on for 2 cycles, which may or may not accomplish the desired goal.

To avoid problems, I am decoding addresses so that page 1 is using a BRAM. While debugging, I use the BRAM for page 0 and 2. Page $FF contains vectors, so into the BRAM it goes.

Arlet was kind enough to point out the obvious fact (that I wasn't paying attention to) - BRAMS are synchronous, but SRAMs are not, so SRAM output must be pipelined, and care must be taken with muxing the results (since they mux depends on previous cycle's selects).

I created a simple core for the SRAM, and after some hammering got it to work - sort of. I wrote a simple LED blinker:
Code:
CTR0 equ $200
CTR1 equ CTR0+1
CTR2 equ CTR0+2
  org $237
start:
;
; blink using a 3-byte counter in memory
;
bl:
  lda   #0
  sta   CTR0
  sta   CTR1
  lda   #40
  sta   CTR2
bl1:
  dec   CTR0
  bne   bl1
  dec   CTR1
  bne   bl1
  dec   CTR2
  bne   bl1
  lda   $C000
  jmp   bl

  org $7FC
  dw start


At 30MHz this blinks the LED nicely ($C000 is decoded to toggle the LED). But, it blinks twice as fast with the variables in SRAM (CTR0 set to $2000 or anything else outside BRAM pages).

I am somewhat perplexed.

At 60MHz the system will not blink when the variables are in the SRAM. It should really go to 100MHz, so I must have done something wrong...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 3:45 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Connected a UART. Communicating via a 3.3V FTDI USB-serial port to my linux box.

I am using the UART from Xilinx, the one that comes with Picoblaze. I wrapped it into a module for easy interfacing (I will keep updating here):
Code:
/******************************************************************************
 UART  - wrapper around the Xilinx Picoblaze uart.
 
 Files required:
   bbfifo_16x8.v
   kcuart_rx.v
   kcuart_tx.v
   uart_rx.v
   uart-tx.v

To connect to arlet's 6502 core, I decode an 8-byte memory location (with SEL).
Base address is the status register when read.  Base+1 is the data register,
which can be read or written.  In the future I may add other registers.
     
******************************************************************************/
module mUART(
  input CLK
, input [2:0] addr
, input SEL
, input WE
, input [7:0] DI
, output reg [7:0] DO
// interface to hardware
, output TX
, input RX
);
//-----------------------------------------------------------------------------
// The UART requires pulses at the rate of BAUD * 16.  At 30MHz, use a 16-bit
// shift register with a single bit on...This provides 115200 * 16 (close to)
wire baud_x16;
SRL16 #(.INIT(1)) baudshifter1 (baud_x16,1,1,1,1,CLK,baud_x16);
//
wire tx_full;
wire tx_half_full;
wire rx_full;
wire rx_half_full;
wire rx_data_present;
wire [7:0] rx_data;
wire do_tx;
wire do_rx;
//-----------------------------------------------------------------------------
// uart tx
//-----------------------------------------------------------------------------
uart_tx utx(
    DI,                     //data to send out
    do_tx,
    0,                      //reset buffer
    baud_x16,               // baud rate x 16 pulses
    TX,                     // output pin
    tx_full,
    tx_half_full,
    CLK);
//-----------------------------------------------------------------------------
// uart rx
//-----------------------------------------------------------------------------
uart_rx urx(
    RX,                 //input pin
    rx_data[7:0],       //input data
    do_rx,              //when strobed, consider read done
    0,                  //rx reset buffer
    baud_x16,           //baud
    rx_data_present,
    rx_full,
    rx_half_full,
    CLK); 
//-----------------------------------------------------------------------------
// mux for the output value: even=status odd=data
always @ (posedge CLK)
  if(SEL & ~WE)
    if(addr[0])
        DO <= rx_data[7:0];
    else
      DO <= {rx_data_present, rx_full, rx_half_full,2'b00, tx_full, tx_half_full};
  else
    DO <= 8'h00;
 
assign do_tx = WE & SEL;
assign do_rx = ~WE & SEL & addr[0];  //any odd address acks read

endmodule


And here is (not the most elegant) 6502 code to send and receive characters, blocking:
Code:
UART_BASE equ $C008           ;C008-C00F dedicated to the UART
UART_STATUS equ UART_BASE
UART_DATA   equ UART_BASE+1

emit:
  tax                 ;move character to x for now
  lda  UART_STATUS
  and  #$02           ;buffer full?
  bne  emit           ;yes, keep trying
  stx  UART_DATA     
  txa
  rts

key:
  lda   UART_STATUS
  and   #$40          ;rx data present?
  beq   key           ;no, keep trying
  lda   UART_DATA
  rts


Attachment:
6502play.c.term.jpg
6502play.c.term.jpg [ 46.71 KiB | Viewed 2140 times ]

With good looking fonts, too! :lol:

I still have to find out the SRAM/BRAM blink rate problem, but now I can write a debugger...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 4:05 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Pretty cool. Keep posting on your progress.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 4:25 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I recommend making a testbench with an SRAM model, and running a simulation.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 4:33 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
At 60MHz the system will not blink when the variables are in the SRAM. It should really go to 100MHz, so I must have done something wrong...

You base this 100 MHz on the fact that you're using 10 ns SRAM ? You're forgetting that the FPGA also has delays to get the internal data on its external pins, and the other way around. Somewhere buried in the FPGA datasheet you should be able to find them, but expect them to be several ns. In addition you have delays on the board.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 4:43 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
I am afraid I will wind up doing just that - simulate.

The core is connected to the BRAM with just a few lines of verilog, so I am pretty sure it's not there. That leaves the SRAM - it must somehow skip something - although I can't imagine what can happen to make the loop run twice as fast. It's very consistent at different clock rates, and running it slowly does not reveal anything obvious (the decrements are working as expected).

How does one go about creating a testbench?

This is a -5 part, so it should be pretty quick, and it seems almost 20ns should be enough time to get data in and out.. The board propagation delays should not be too bad, with less than 25mm maximum trace length.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 4:46 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
On the positive side, the entire system is very small:
Code:
Logic Distribution:
  Number of occupied Slices:            371 out of   5,888    6%

That's with the UART, a SRAM controller and some minor IO decoding.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 5:00 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Can you post your SRAM code ?
Quote:
This is a -5 part, so it should be pretty quick, and it seems almost 20ns should be enough time to get data in and out
Running at 60 MHz, you only have 16 ns, of which 10 ns are needed by the SRAM. That only leaves 6 ns for the FPGA input, output, and external delays.

You could run a simple test where you wire two pins from the FPGA together. You write to one pin, and read on another. Now, toggle the output pin with each clock, and compare with input. Increase the clock. For some clock speed, you'll see that the input will no longer have the same value as the output. From that speed, you can deduce the delays.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 5:13 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
The SRAM controller:
Code:
/******************************************************************************
 a controller for a 128K async SRAM.
 Makes it work like a BRAM
******************************************************************************/
module mSRAM128K(
  input sclk
, input en              //1 means do enable;
, input we
, input  [16: 0] ab
, input  [ 7: 0] di
, output reg [ 7: 0] do
// interface to the real chip
, output [16: 0] xab
, inout  [ 7: 0] xdb
, output  xcs_
, output  xoe_
, output  xwe_
);
  assign xcs_ = ~en;    //chip select is inverted
  assign xwe_ = ~we;    //we is inverted
  assign xoe_ = 1'b0;     //see SRAM timing diagrams
  assign xab  = ab;     //address bus permanently connected
  //
  // Implement a tri-state circuit for the inout xdb
  // Reading a deselected chip should return 0 if pulldowns are enabled.
  wire [7:0] xin = we ? 8'bZ : xdb ; //on read, let data in   
  assign xdb     = we ? di   : 8'bZ; //on write, let data out 
  //
  // Return read result next cycle
  always @ (posedge sclk)
    if(en & ~we)  //on write, output 0 - for some reason it's FF
      do <= xin;
    else
      do <= 8'h00;
         

endmodule

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Sat Jun 29, 2013 4:34 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 5:16 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
All modules (SRAM, BRAM and IO) output 0 when deselected, so I can mux by ORing.
Code:
// Memory address selects
assign IO_SEL = (CPU_AB[15:8]==8'hC0); // C0xx
assign BRAM_SEL = (CPU_AB[15:8]==8'hFF)|(CPU_AB[15:9]==7'h00)|(CPU_AB[15:8]==8'h02);//page 0,1,2 or FF use BRAM...
assign IO_C000_SEL = IO_SEL & (CPU_AB[7:0]==8'h0);
assign UART_SEL =    IO_SEL & (CPU_AB[7:3]==5'h1);
assign SRAM_SEL = ~(IO_SEL | BRAM_SEL);


//DI mux.  Since we are careful about setting unselected values to 0, we can just or all of them in
assign CPU_DI[7:0] = BRAM_DO | SRAM_DO | UART_DO;


I guess I could tighten up the SRAM select code...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 6:27 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Hmm.. I don't see anything suspicious in the code.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 3:52 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
It's a pretty good puzzle. The setup - Arlet's core and BRAM and SRAM. Stack in page 1, code also in a BRAM.
Code:
           Variables in:
          BRAM       SRAM
60MHz      OK        FAIL
30MHz      OK        2x blink
15MHz      OK        2x blink
1MHz       OK        2x blink


At 1Hz, observing the address bus and the DIN bus with a 4-digit LED display, things look convincingly similar with SRAM and BRAM... Wait, I didn't think much of it, but the inner loop took 9 cycles which is too long. [EDIT] No, 9 cycles is right:
Code:
bl1:
  dec   CTR0    ;6;
  bne   bl1     ;3;  branch taken


I should really count up the cycles to see which blink rate makes sense at a given frequency...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 4:33 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Ok, the blink rate is really about twice what it should be when counters are in the SRAM. So, one working theory is that D7 is reading or writing low, causing the counters to run out faster. Although that would make it more than 2x, and it seems I am a little under 2x.

I guess the thing to do (other than simulation) is to write a simple memory test.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 7:41 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
I wrote a simple memory test - filling a page with sequential bytes. Sure enough, BRAMS look good but looks like the SRAMS are missing the top 2(!) bits. I have a 6-bit memory connected. I am checking the schematics and the connections now.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 29, 2013 8:03 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
It is not a wiring problem either. My first assumption was that it's another BGA problem. But no!

I configured the FPGA to set these 2 pins high, then low. I can read the voltages off the J leads of the SRAM, and they do go high and low. So the physical connection between the FPGA and the pads is ok, at least as far as DC goes.

Somehow the verilog code is incorrect, but just for the 2 high bits...

EDIT:
Modified the SRAM core to set the high 2 bits on write, that works. That implies that I am only getting 6 bits out of Arlet's core. When I OR the high 2 bits of DO (as it enters the SRAM in the SRAM instance) from the cpu high, memory is written with 2 high bits set... Otherwise, the 2 high bits are always low, for the SRAM only.

So, the SRAM core is not the problem - feeding high bits into it works.

Switching bits 6/7 with 0/1 with in the ucf file... Will it write the top 6 bits and leave low 2 bits 0? It just got worse:

0500: 00 00 00 03 04 04 04 07 08 08 08 0B 0C 0C 0C 0F

It wrote 3, but not 1 or 2! Magic. It only writes bits 1 and 2 when both are set! It implies that earlier, had I looked at 05C0 the high bits were on - but I didn't bother looking up there.

Arlet - any thought? I will continue looking.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 55 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: