6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 10:17 pm

All times are UTC




Post new topic Reply to topic  [ 41 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Thu Feb 15, 2024 3:20 pm 
Offline
User avatar

Joined: Tue Dec 12, 2023 7:00 pm
Posts: 25
Location: London, UK
It looks like all commercialy available 6502/65816 SBCs, like Commander X16, Foenix or even those offered directly by WDC (W65C816SXB) are clocked at 8MHz or less, as if there was some hard limit. On the other hand there are reports on this Forum of SBCs running at 14MHz or even more - without any help from FPGA. I would like to start a conversation about the 8Mhz limit and methods to overcome it to utilise the full oficial speed of W65C02 and W65C186 (14MHz). I'm aware that this knowledge is scattered across the Forums in various threads, including the pinned one Techniques for reliable high-speed digital circuits, but I would like to focus specifically on that problem - crossing the 8MHz barrier. That may be helpful to those (like me) who built their own 1-4MHz computers and are looking for new challenges.

At first, let's assume a simplest possible computer, used for computation only (no I/O), consisting of CPU, RAM and ROM only. If it may be neccessary to rule out the ROM (I'am not aware of any that can run at 14MHz), that's fine too - but then I'd be curious about your techniques to prepopulate RAM / intialize such system. And, as a bonus, would be great to know what techniques you use to make your system to communicate with other devices.

In my, most likely very naive way of thinking, I see two possible solutions:
- dual-processor system with one CPU responsible for I/O, initialization, IRQ's, etc and other just paired with RAM, running at full speed. I wouldn't call it a properly parallel system, as - due to RAM sharing - in most cases it would be only one active CPU.
- variable-speed (dual clock) solution, which is perhaps a cheaper/simpler variant of the one above (full-speed for RAM access, limited speed for anything else).

I am sure there must be some more clever solutions than my straight-forward ideas.
Thanks,
David


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 15, 2024 3:33 pm 
Offline

Joined: Fri Jan 25, 2019 2:29 pm
Posts: 191
Location: Madrid, Spain
Hi!

My SBC6526 runs reliably up until 14-15 Mhz.

It has a WDC 65c02, a WDC 65c22, and two MOS 6526 (Which, of course, won't handle that speed, so they need to be out before trying!)

Besides this. A 128KB SRAM for RAM.
74AC14 and 74AC74 for clock generation.
74AC139 and 74AC138 for address decoding.

And that's pretty much everything.

The 65c22 handles a 40x4 lcd screen, plus 4 buttons that serve as input. It's a very basic device with a very specific function, so it didn't need much more, in terms of IO.

An arduino, with the help of two 74HC595 is able to halt the CPU, eject it from the bus (With the BE pin), write the whole ram (only 64kb of the 128kb are available) and then reset the computer. The arduino has the boot image (2kb) stored on its flash, so on power up if no PC is connected, this is written to RAM. I can push a a full 64kb ram image via SERIAL if needed.

I did nothing too fancy with this board. It's a 4 layer, with a full uninterrupted GND plane, and of course, each IC has its own bypass cap.

I'm no expert by any means, and it came out very nice.

It wasn't meant to communicate with anything else, but, if needed, and if you can get hold of a 14MHz 6526 (PD: I am trying to) you could use the 6526s 4 8bit parallel ports, and 2 Serial ports for communication. the 65c22 Serial port is also free, and it's parallel ports are very under utilized. The design could be adapted without too much effort to have 3 65c22. You can bitbang SDcards, ps2 keyboards, I2c, SPI with no problems at this speeds, giving you full access to pretty much anything you can think of. USB, bluetooth, ethernet maybe?

As you say, ROMs are not capable of 14MHz. One approach I've seen before is, startup slow (1Mhz) then, boot from ROM and have the 65c02 copy the ROM contents to RAM, then, unselect the ROM from good and run only from RAM, now at full speed.


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 15, 2024 4:01 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1399
Location: Scotland
What's a "standard IC" these days ...

I was under the impression that the Foenix ran at 14Mhz -however it uses FPGAs ...

My own Ruby '816 board runs at 16Mhz - 512KB of RAM) but I use a GAL for address decoding and another for the latch to generate more address bits. I used a double sided PCB. GALs were "standard" ICs when the '816 was released in the mid 80s.

My 65C02 predecessor ran on stipboard, also at 16Mhz. Single GAL.

I think BDD here has an 816 going at 20Mhz with carefully selected and placed 74 series ICs ...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 15, 2024 4:44 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8147
Location: Midwestern USA
drogon wrote:
I think BDD here has an 816 going at 20Mhz with carefully selected and placed 74 series ICs ...

My POC V1.2 unit is stable at 20 MHz using only discrete (74AC) logic.  ROM and I/O are wait-stated to cope with the clock frequency.  Only thing is V1.2 doesn’t have bank latching hardware.  Logic analysis seems to indicate the unit would be able to run at 24 MHz.

POC V1.3 runs at 16 MHz, is also discrete logic, has bank-latching and is wait-stated on ROM and I/O accesses.  Schematic attached.

Attachment:
File comment: POC V1.3
pocv130.pdf [344.54 KiB]
Downloaded 31 times

Both V1.2 and V1.3 have real-time clocks, four TIA-232 ports and SCSI mass storage.

Others here, notably plasmo, have gotten the 65C02 up into the 30 MHz range.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 15, 2024 4:47 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8147
Location: Midwestern USA
ytropek wrote:
At first, let's assume a simplest possible computer, used for computation only (no I/O), consisting of CPU, RAM and ROM only.

“A computer without I/O is as useless as a sun roof in a submarine.”  —Unknown wiseass.  :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 15, 2024 5:24 pm 
Offline
User avatar

Joined: Tue Feb 28, 2023 11:39 pm
Posts: 133
Location: Texas
I recall seeing the speed of the Commander X16 being capped at 8Mhz and was wondering why that might be. I haven't seen their circuits, but I'm comfortably running at ~6Mhz on a breadboard, and I suspect my setup would be fine at 8Mhz, which I'm going try once I get some more oscillators. The breadboard is far from an ideal environment, and the datasheets I have state the WDC65C02 should be fine up to 14Mhz; various bits of feedback I've gotten here suggests it can go much higher than that; though I could understand why they wouldn't want to run it past the spec on the X16 or SXB.


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 15, 2024 5:30 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 745
Location: Germany
then again, the SuperCPU for the C64 ran it's 65816 at 20MHz, way out of spec but they still made it into a product.
so really there is little reason for the X16 to limit its own performance like that.
one thing i could think of is that faster logic might've made entire board more expensive, so it was done as a cost cutting measure... then again i remember 8-bitGuy's first few videos where he said that he couldn't go above 8MHz because of the sound chips.
but i doubt that's the real reason as slowing down the whole system because of 1 or 2 chips makes little to no sense to me when wait states/clock stretching are a thing and pretty easy to implement.


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 16, 2024 3:59 pm 
Offline

Joined: Fri Mar 18, 2022 6:33 pm
Posts: 432
ytropek wrote:
- crossing the 8MHz barrier.


8MHz isn't much of a barrier for the 6502 itself. A stock through-hole WDC65C02 can manage 20MHz+ with no trouble as demonstrated by Bill (plasmo).

I don't know specifically why WDC's boards are clocked at 8MHz. However, at 1MHz, each phase of the 6502's two-phase clock is 500ns. In practice, the 6502 needs about 10ns from each phase unto itself for address hold and data setup, leaving 490ns on either side of the rising edge of Ø2. At 1MHz, modern 6502s spend a lot of time sitting around twiddling their thumbs waiting for things to happen.

At 10MHz, each phase of the 6502's two-phase clock is 50ns. When timing margins start to get tight, you also have to consider that it can take up to about 5ns for an address or data line to switch, so in practice you have to subtract 30ns from your total cycle time. At 10MHz, that leaves 35ns on either side of the rising edge of Ø2 to get things done.

I think you'll find that when it comes to going fast there are two theoretical bottlenecks, and one practical challenge.

The first bottleneck is ROM speed. If you're making a relatively traditional design with some portion of the memory map devoted to ROM, which contains boot code, you will find that most ROMs have an access time in the 100 - 150ns range. The fastest EEPROMs and some EPROMS have 70ns access times. There are some OTP ROMs that are as fast as 40ns. (In practice, ROMs are often faster than their specs let on. I have a nominally 70ns Atmel EEPROM that runs at 12.5MHz with no trouble.)

Some ways forum members deal with this:

1. Wait-states
2. Clock stretching (using Jeff's amazing `163 counter trick)
3. ROM-less systems (using an Arduino, PIC, or some other device to to load a RAM image for the 6502 to boot from)

The second bottleneck is address decoding. For example, suppose I want to run my 70ns Atmel EEPROM "by the book," rather than overclocking it. My cycle time will look something like this:

10ns address hold + 5ns switching time + Xns address decoding + 70ns access time + 5ns switching time + 10ns data hold

We're already at X+100ns, which means that if we run at 10MHz we're overclocking something. Each layer of complexity added to address decoding (X) will put more reliance on something being over-specced and make the design more likely to fail. Disregarding the slow speed of ROM, address decoding is still a bottleneck if you want to use any WDC "family parts," such as the WDC65C22. These parts have to already be selected when Ø2 goes high, which means that at 10MHz your I/O address decoding has to fit into about 35ns. Even with a fairly fast logic family such as AHC this will put a hard limit on complexity of about 4 - 7 ICs deep. I ran into this with Blue August, where I failed to hit my target of 16MHz. My RAM and clock-stretched ROM worked fine, but when it came to I/O I just couldn't get everything done in the 16ns I had available.

Going at 20MHz vs. 10MHz cuts your cycle time in half, which gives you about 10ns before the rise of Ø2 for your address decoding. Good luck! This is why people start using FPGAs.

Finally, the practical challenge is that, even if your math works out on paper, getting a design to work *at all* is more challenging at higher speeds. The most important consideration is to have a robust ground return network. (See the High Speed Digital Circuits thread, and Garth's Primer page about AC performance problems).

P.S. if you think of a WDC65C02 + 12ns SRAM as a kind of philosophical reference system (no I/O, no ROM, no address decoding) you get a kind of theoretical maximum speed of around 37MHz. We saw this in Bill's designs, where he was able to get close to 40MHz before he had to start using crazy tricks like over-voltage to go faster. This thought experiment system isn't very useful, but it does tell us that, for hobby systems, the 6502 and the RAM are basically "free." They will never cause us problems because ROM, I/O, and address decoding are so much slower. So, practically speaking, if you use the fastest ROM you can, waitstate or clock stretch, keep your address decoding simple, you should be able to hit the WDC65C02s 14MHz speed without too much trouble. Between 14MHz and 20MHz is pretty tricky without programmable logic, but can be done if you're precise.

_________________
"The key is not to let the hardware sense any fear." - Radical Brad


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 16, 2024 7:21 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
One way to look at this is that counting up nanoseconds is simpler and more direct than dealing in MHz. You only need to convert when considering the crystal or clock source.


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 16, 2024 8:53 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 660
Location: Potsdam, DE
Heh, I just bounced off a minor issue: one ram read, one eeprom read, and some memory decoding need to happen in 79ns... shame the ram takes 45, the decoding probably 10 or 15, and the eeprom 70 at best but more likely 150 (because that's what I've got in the bits box).

Neil


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 16, 2024 9:42 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8147
Location: Midwestern USA
Paganini wrote:
Between 14MHz and 20MHz is pretty tricky without programmable logic, but can be done if you're precise.

Not really.

My POC V1.2 unit easily reached 20 MHz using only 74AC logic, and appears to be capable of 24 MHz.  I used a single wait-state via clock stretching (Jeff’s AC163 circuit) for ROM and I/O accesses.  The ROM is an AMD 27C256-55, which is rated at 55ns.  My older POC V1.1 unit will boot at 14 MHz with no wait-stating, using the same ROM.  POC V1.3 runs at 16 MHz and has a calculated clock ceiling of ~18 MHz. Ironically, the bottleneck in V1.3 is in the AC163 clock-stretching circuit, for reasons I describe in my “wait-stating with clock trickery” topic.

The key design features that make it possible are use of 74AC/ACT logic, having no more than two gate delays in the chip select logic, and not qualifying any chip selects with Ø2.  There is no magic to any of this.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 16, 2024 11:29 pm 
Offline
User avatar

Joined: Tue Feb 28, 2023 11:39 pm
Posts: 133
Location: Texas
I would think at the point that you need 3 or 4 ICs to do your address decoding you might consider using a GAL or a CPLD or something. You'd save some money on the costs in that way by reducing the chip count and the complexity of your board.

Just eyeballing the data sheet here for the 1502s I have, state they have a 7.5ns pin to pin delay with (at most) 2ns hold time. There's probably a lot more to it than that, but the rest of the datasheet would indicate the part is quite happy to be run all the way up to 125MHz.

The SBX might not really need to be all that quick, it is designed to be a development board after all, but I'd think the X16 would be keen on reducing chip count and boosting the clock speed. Especially given that it seems the 8-Bit Guy likes to show speed comparisons of the X16 against other 6502 systems.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 17, 2024 3:18 am 
Offline

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217
Proxy wrote:
then again, the SuperCPU for the C64 ran it's 65816 at 20MHz, way out of spec but they still made it into a product.

so really there is little reason for the X16 to limit its own performance like that.


IIRC, they wanted everything to be DIP if at all possible, so the 512KB SRAM they are using is not particularly fast. I do know when I go to 512kx8, 5v parallel SRAM at Mouser and then hit the through hole filter, the results drop down to two 55ns ICs, so I am guessing they are using a 55ns HighRAM.

Quote:
one thing i could think of is that faster logic might've made entire board more expensive, so it was done as a cost cutting measure... then again i remember 8-bitGuy's first few videos where he said that he couldn't go above 8MHz because of the sound chips.


IIRC, that was in the 2x AAY-3-8910 + YM2151 phase of the project. As the winner of the FPGA Video chip audition included 16 channels of PSG on Vera, the AAY-3-8910 were eventually discarded. AFAIU, they have some form of clock slow down for the portion of the I/O page that hosts the YM2151, at 4MHz for the YM2151, that's halving the clock cycle.

Quote:
... but i doubt that's the real reason as slowing down the whole system because of 1 or 2 chips makes little to no sense to me when wait states/clock stretching are a thing and pretty easy to implement.


Since I think the original YM2151 runs at 4MHz, I think they are clock stretching in that part of the I/O page in any event. Perhaps for that to be in sync the system clock should be a multiple of 4MHz, and IIUC, 12MHz is definitely too fast for the SRAM they are using if they are insisting on using through hole glue logic for the address decode.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 17, 2024 10:39 am 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 745
Location: Germany
there are other DIP SRAM chips that are a lot faster, the W241024AK for example is "only" 128kB but also has an access time of 10-15ns.
but you'd need 4 of those, which will overall be more expensive than a single 512kB chip

though ironcially, if they didn't stick with their "as much DIP as possible" thing they could've saved some cost and made it run at +16MHz by using the IS61C5128AL, which is a 10ns 512kB SRAM chip in a TSOP-44 package that only costs half as much as the AS6C4008 (current mouser prices).

but i guess the overall idea of the X16 is modularitiy and tinkerability (if that is a word) and not absolute performance. then again because they sdid use DIP, someone could in theory make an adapter for the faster chips but it would also require some extra work on the rest of the board to make running at a higher clock possible.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 17, 2024 10:46 am 
Offline
User avatar

Joined: Tue Dec 12, 2023 7:00 pm
Posts: 25
Location: London, UK
Thank you for all comments, very good hints there. Below are my comments regarding some of the "official" boards mentioned earlier.

drogon wrote:
I was under the impression that the Foenix ran at 14Mhz -however it uses FPGAs ...

As far as I know Foenix (256K) run at 6.29MHz and uses standard W65C02 or W65C816 as an option.

Proxy wrote:
(...) there is little reason for the X16 to limit its own performance like that. (...)i remember 8-bitGuy's first few videos where he said that he couldn't go above 8MHz because of the sound chips.
but i doubt that's the real reason as slowing down the whole system because of 1 or 2 chips makes little to no sense to me when wait states/clock stretching are a thing and pretty easy to implement.

I think the Yamaha sound chip is the main reason. It was also mentioned by Adrian in his X16 review on his "Digital Basement" channel. I also suspect that, as the X16 board is relatively big, the track lengths may play some role here. Plus, naturally, there is ROM.

I'm mostly surprised that WDC boards are running at 8MHz too and that they didn't make these boards a showcase of what the WDC processor are capable of. I assume that it's again related to ROM, as I can't think of any other reason.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 41 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 29 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: