6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 1:46 am

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
PostPosted: Fri Aug 07, 2015 10:23 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
An updated CHOCHI design is complete.
I've removed the uSD port (as there is no FPGA or 6502 support for it yet), replacing it with another PMOD port.
The 45 MHz 6502 core is stable and runs FIGFORTH and EhBASIC.
Attachment:
CHOCHI_J.jpg
CHOCHI_J.jpg [ 119.59 KiB | Viewed 2513 times ]

I have a few boards if anyone is interested.
For more information see the CHOCHI site http://apple2.x10.mx/CHOCHI/index.html

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 10, 2015 1:35 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
I added proper parallel ports to the CHOCHI J bitstream. The board now has bidirectional ports - 2 8-bit ports, one 4-bit and one 3-bit (to be fixed soon). Each port has a direction register and a data register. There is more information at http://apple2.x10.mx/CHOCHI/index.html

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 10, 2015 5:46 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hi enso
Nice to see a new revision! I wonder if, for at least one port and maybe two, you could include the 390 ohm current-limiting resistors for 5V compatibility?

Maybe, as an idea, bring out D0 to D7, A0 to A3, RnW, a ChipSelect and a clock? Then there would be a 5V compliant expansion bus for a few addresses. Oh, perhaps RDY would also be needed, or the clock should slow down for such an external access.

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 10, 2015 9:37 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Those are good ideas. I am wondering what 390-ohm series resistors will do for those who want to work with 3.3V logic though. Luckily, Spartan3 has the diodes enabled, so anyone can just put the resistors outside the board.

I've been thinking of adding a 6502-visible IO switch that would reconfigure the ports as parallel ports or raw '6502 pins', kind of what you are saying. The IO muxing is already giving me problems though - I had to slow the chip down to 40MHz to mux the 4 IO ports, BRAM, SRAM and serial port within the timing constraints... I have an idea of how to make it faster, but haven't had a chance to try it out as it requires significantly moving things around.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 12, 2015 12:19 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
I had a theory that my max clock speed had to do with SRAM timing uncertainties. That seems plausible, considering that I can run Arlet's core at or near 90+MHz with BRAMs, but as soon as SRAMS are added, 45MHz seems to be the top.

My suspicion was that the input mux on CPU's data in wires was causing at least some of the delays. As you know, outputting is simple. Input data needs to be muxed in.

I use an OR mux. Every input peripheral speaks only when selected, and outputs 0 when not selected. The outputs can be just OR'ed in together instead of building a proper select-based mux. That seems to work well.

So I thought that bringing all peripherals the data of which I explicitly register into the mux, and then registering it may make the circuit faster. This includes the SRAM. Previously I registered each peripheral data, and muxed it the next cycle. Since DI lines inside the CPU do all kinds of stuff (that is, don't go directly to some register), my thinking was that removing the mux overhead should improve the timing.

Well, I was wrong. The exercise of moving the mux before the registers was helpful in eliminating a bunch of registers, but the timing has not improved. I am unable to budge CHOCHI above 40MHz it is currently at.

I wonder if I could improve SRAM timing at the expense of the rest of the IO by prioritizing SRAM data, and holding up the CPU by a cycle for all the rest...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 12, 2015 7:42 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
What's the speed of the SRAM part, enso? Add a bit for getting on and off the FPGA, what kind of cycle time would that give you ideally?

You have 9k of block RAM - I suppose much of that would be ROM. But 1k you could place at the bottom of memory and have fast zero page and stack. Then you could add a waitstate for external SRAM, without too much performance hit. Or, to put it another way, no performance hit, because two cycles at double speed is the same performance, but now you gain by having a fast 8k ROM, fast zp and stack, and a little more fast RAM for small programs.

Edit: oops, the 72kbit of block RAM will be 8k x 9bits or related combinations. That's not so obvious how to make 9k x 8bits which is what I was thinking of. Especially with only 4 blocks... maybe it can't be done.
Ref: http://www.xilinx.com/support/documenta ... app463.pdf
Or maybe the 12kbits of distributed RAM could be used? 2k for zp, 2k for stack.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 12, 2015 8:05 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
BigEd, the current design uses a 10ns SRAM. Internally, I use a single 18Kbit BRAM configured to fit the top 4 pages of the memory map, to catch the vectors, and the bottom 4 pages, to include the zero page and the stack page (and avoid double-writes to SRAM). The chips wakes up with a minimal serial loader in the top of memory (I have an experimental monitor that allows inspection and modification, but it's clunky). I have 3 unused BRAMs, as I was trying to stay minimal and leave room for other peripherals (video comes to mind).

One thing I could do is register address and data and clock the SRAM at 100MHz. I should be able to run the CPU at 50MHz then.

I am still not happy with my understanding of the timing issues. Adding the IO ports to the input mux originally forced me to slow down to 40MHz, so moving the mux in front of the register should have allowed me to increase the speed. I think I will try to constrain the design to a smaller area (if I can remember how to do that) as perhaps arbitrary placement is damaging the timing.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 12, 2015 8:24 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
There's an older discussion here which might be useful:
viewtopic.php?p=25676#p25676
Arlet got things running at 100MHz with external memory.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 12, 2015 10:05 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Thanks for the link, I've lost that thread a while back, and was going to look for it.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 13, 2015 8:30 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Please could you share a link to a datasheet for your RAM?

I suspect you're using SRAM, which is easier to control than SDRAM. Arlet has SDRAM, and a controller, and can go about twice as fast. That might be natural - a 10ns SRAM might top out at 50MHz or so. In which case, wait states might be a good and simple answer - a CPU at 100MHz with one wait state is no worse than one at 50MHz with no wait states, but it has the possibility of faster access to onchip RAM.

Good to hear that your block RAMs cover both high and low memory!


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 14, 2015 9:36 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
You should be able to run the 10 ns SRAM with a 100 MHz clock, with a new access every cycle. However, you need to compensate for the latency. From valid address -> valid data on the FPGA will take more than 10 ns. The answer is to sample the incoming data a cycle later by inserting a wait cycle using RDY.

This is still faster than running the system at 50 MHz for various reasons:

- non read cycles (write or internal) can be done without wait state
- burst reads can be done at 100 MHz, for example by a DMA controller or display generator.
- access to local BRAM is not affected.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 15, 2015 2:04 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Thanks for chiming in, Arlet. Nice to see your name appear on this forum again! :)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron