W65C816 overclock experiment
W65C816 overclock experiment
Recent discussions about W65C816 have motivated me to look into it and see whether it can run reliably to VGA pixel clock of 25.275MHz. If so, a 65816 that accesses large graphic memory @25.275MHz may serve as a VGA graphic controller driving 640x480 color graphic. I've placed an order with Mouser for several W65C816, but I want to check out my overclock testbed before plugging in a part I have no experience with. I noticed W65C816 pin assignments are very similar to W65C02, presumably that allows W65C816 to plug into existing W65C02 hardware? So if I check out with a W65C02 first, I'm in much better position to check out the W65C816?
The test bed is a prototype board with a 128-macrocell CPLD, EPM7128S, similar to ATF1508 and an oscillator. https://www.retrobrewcomputers.org/doku ... mo:protorc I plan to wire in a 40-pin socket in the prototype area for 6502/65816. The test bed will consist of 65xx, CPLD, and socketed oscillator where CPLD will serve as a small but fast ROM plus a serial port to talk to the console.
Once I managed to get it to boot, I'm sure I'll have questions about how best to check out 65816 with 64-128 bytes of ROM-only code.
Bill
The test bed is a prototype board with a 128-macrocell CPLD, EPM7128S, similar to ATF1508 and an oscillator. https://www.retrobrewcomputers.org/doku ... mo:protorc I plan to wire in a 40-pin socket in the prototype area for 6502/65816. The test bed will consist of 65xx, CPLD, and socketed oscillator where CPLD will serve as a small but fast ROM plus a serial port to talk to the console.
Once I managed to get it to boot, I'm sure I'll have questions about how best to check out 65816 with 64-128 bytes of ROM-only code.
Bill
Re: W65C816 overclock experiment
plasmo wrote:
Recent discussions about W65C816 have motivated me to look into it and see whether it can run reliably to VGA pixel clock of 25.275MHz. If so, a 65816 that accesses large graphic memory @25.275MHz may serve as a VGA graphic controller driving 640x480 color graphic. I've placed an order with Mouser for several W65C816, but I want to check out my overclock testbed before plugging in a part I have no experience with. I noticed W65C816 pin assignments are very similar to W65C02, presumably that allows W65C816 to plug into existing W65C02 hardware? So if I check out with a W65C02 first, I'm in much better position to check out the W65C816?
This doesn't have the full details, but: https://projects.drogon.net/ruby-6502-b ... uby-65816/ has the essence.
Hope it does work though, but my own concern is the sheer quantity of RAM that needs to be moved in a VGA system - if you go 1 bit per pixel then it's just under 40KB of RAM - with the overhead of bit masking, or at 8bpp you're up to 5 banks of 64K of RAM with the overhead of 24-bit addressing...
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: W65C816 overclock experiment
plasmo wrote:
Recent discussions about W65C816 have motivated me to look into it and see whether it can run reliably to VGA pixel clock of 25.275MHz. If so, a 65816 that accesses large graphic memory @25.275MHz may serve as a VGA graphic controller driving 640x480 color graphic. I've placed an order with Mouser for several W65C816, but I want to check out my overclock testbed before plugging in a part I have no experience with.
When you get your 816s from Mouser be sure to verify that the part number is W65C816S6 or W65C816S6T, both of which are 0.6µ geometry. The older S8 part was in Mouser's inventory not too long ago—it would be 0.8µ geometry. I can tell you from my experiments last year with POC V1.2 that the S8 version was unstable at 15 MHz. I was able to run V1.2 at 20 MHz with an S6 part. 20 was the highest speed at which I tested.
Quote:
I noticed W65C816 pin assignments are very similar to W65C02, presumably that allows W65C816 to plug into existing W65C02 hardware?
Not quite. The 816 is not pin-compatible with the 65C02, due to the former replacing some of the signals in the latter with different ones. Reading the data sheet will fill you in on what is different.
Something that you must consider is the 816 drives the data bus during Ø2 low with the bank bits. You cannot allow anything to drive the data bus during that time, else severe contention will occur.
Also a caveat, the 816's handling of the address bus is the same as that of the NMOS 6502, plus the 816 can cause dummy accesses during some parts of some instruction cycles. The 816 has the VDA and VPA outputs to tell you the state of the address bus—WDC states that all memory and I/O accesses should be qualified by those signals. Ignore them at your peril.
Quote:
So if I check out with a W65C02 first, I'm in much better position to check out the W65C816?
Why bother? Wire it up for the 816 and start testing. It will either work or not work. Try booting at a slow clock rate. If the machine computes then you can step up the clock speed.
Quote:
I plan to wire in a 40-pin socket in the prototype area for 6502/65816.
I recommend you use the PLCC44 version, which has more Vcc and ground pins. At the high speeds you are hoping to run, you need a very solid Vcc and ground, which is why there are multiple pins.
Quote:
The test bed will consist of 65xx, CPLD, and socketed oscillator where CPLD will serve as a small but fast ROM plus a serial port to talk to the console.
Once I managed to get it to boot, I'm sure I'll have questions about how best to check out 65816 with 64-128 bytes of ROM-only code.
Once I managed to get it to boot, I'm sure I'll have questions about how best to check out 65816 with 64-128 bytes of ROM-only code.
What are you going to use to run the serial port? Without RAM it may get interesting (no stack, for example).
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: W65C816 overclock experiment
BigDumbDinosaur wrote:
I recommend you use the PLCC44 version, which has more Vcc and ground pins. At the high speeds you are hoping to run, you need a very solid Vcc and ground, which is why there are multiple pins.
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: W65C816 overclock experiment
BigDumbDinosaur wrote:
the 816 can cause dummy accesses during some parts of some instruction cycles.
- sometimes a 6502 or '816 will read from an address other than the one specified. Then on the following cycle another read occurs, this time from the correct address.
- sometimes a 6502 or '816 will write to the correct address; however, the value written isn't correct. Then on the following cycle another write occurs to the same address, and this time the value is correct.
BigDumbDinosaur wrote:
WDC states that all memory and I/O accesses should be qualified by [VDA and VPA] signals.
But! As for protecting memory from dummy cycles, one has to ask:
- Will a fully formed but unexpected read or write as described above cause trouble with the memory I'm using?
If you trust yourself enough to answer no then you'll be comfortable ignoring WDC's inexplicable inclusion of the word memory in that sentence.
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: W65C816 overclock experiment
Thank you all for the feedbacks. I'm expecting the W65C816 from Mouser mid-week as well as a batch of pc boards from JLCPCB. The new batch of boards have higher priority over the 65816 overclock experiment but I have time between now and midweek to experiment with the overclocking board that's why I want to first check it out with 6502.
PLCC44 is a great suggestion but I've already ordered 65816 in 40-pin DIP package to match the existing 40-pin DIP 6502. If I managed to overclock 40-pin DIP 65816 to 25MHz, it will give additional margin for the PLCC44 package. I will check the markings of 65816 to see whether it is 0.8u or the faster 0.6u technology. The faster 0.6u technology is a mixed blessing because I really need to be careful with ground of the prototype board.
I'll definitely watch out for the dummy cycles. Memory does not matter, but dummy cycles will definitely affect the I/O operations.
Not exactly sure what I can do with serial port having only 128 bytes of ROM, either. It can spew out characters; echo back what it received; and execute different routines based on input. That may be sufficient to establish some confidence that CPU is operating correctly.
I've soldered down the 100-pin QFP CPLD, powered it up and able to program it. I'm in the middle of wiring up the 40-pin DIP socket.
Bill
PLCC44 is a great suggestion but I've already ordered 65816 in 40-pin DIP package to match the existing 40-pin DIP 6502. If I managed to overclock 40-pin DIP 65816 to 25MHz, it will give additional margin for the PLCC44 package. I will check the markings of 65816 to see whether it is 0.8u or the faster 0.6u technology. The faster 0.6u technology is a mixed blessing because I really need to be careful with ground of the prototype board.
I'll definitely watch out for the dummy cycles. Memory does not matter, but dummy cycles will definitely affect the I/O operations.
Not exactly sure what I can do with serial port having only 128 bytes of ROM, either. It can spew out characters; echo back what it received; and execute different routines based on input. That may be sufficient to establish some confidence that CPU is operating correctly.
I've soldered down the 100-pin QFP CPLD, powered it up and able to program it. I'm in the middle of wiring up the 40-pin DIP socket.
Bill
Re: W65C816 overclock experiment
plasmo wrote:
Memory does not matter
Referring to the diagram and Truth Table below, it's easy to see that the I/O device is enabled during data accesses but disabled during dead cycles, and that's all we require. Dunno what sort of I/O you're planning -- maybe it's not a 6522, for example. But if you have or can contrive an extra active-high Enable input then you're home and dry! Notice there's no need to inject VPA anywhere, neither in the glue for memory nor the glue for I/O.
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: W65C816 overclock experiment
Dr Jefyll wrote:
BigDumbDinosaur wrote:
the 816 can cause dummy accesses during some parts of some instruction cycles.
- sometimes a 6502 or '816 will read from an address other than the one specified. Then on the following cycle another read occurs, this time from the correct address.
- sometimes a 6502 or '816 will write to the correct address; however, the value written isn't correct. Then on the following cycle another write occurs to the same address, and this time the value is correct.
BigDumbDinosaur wrote:
WDC states that all memory and I/O accesses should be qualified by [VDA and VPA] signals.
But! As for protecting memory from dummy cycles, one has to ask:
- Will a fully formed but unexpected read or write as described above cause trouble with the memory I'm using?
If you trust yourself enough to answer no then you'll be comfortable ignoring WDC's inexplicable inclusion of the word memory in that sentence.
-- Jeff
Something you may be overlooking is that a spurious ROM or I/O chip select may be inadvertently associated with an unwanted wait-state. During a "dead" cycle, if an address is formed that "points" to either one of those and the wait-state part of the glue logic "watches" for ROM or I/O addresses, a wait-state will occur, slowing down the MPU during a cycle in which there is no need to do so. Qualification with VDA and VPA will eliminate that possibility.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: W65C816 overclock experiment
If you can convince me that memory needs protection from dummy cycles then that would be a game changer. Maybe such a memory exists and I just haven't heard about it yet. But certainly the memory you and I use doesn't need protection from dummy cycles, and WDC's over-eager recommendation doesn't apply -- our only concern is I/O. It's worth reviewing what's stated above. Per the Truth Table, the I/O device will be enabled during data accesses but disabled during dead cycles, and that's all the protection from dead cycles we require.
To be clear: if memory doesn't need protection from dead cycles then VPA is useless for that purpose (although VPA does have other purposes).
Oops, yes; I did overlook that. So let's look at using VPA for the purpose of avoiding needless wait states.
If someone uses a Wait State Generator with ROM it'll be to reconcile the use of the (slow) ROM with the goal of higher clock speeds and better overall performance. Of course the fastest solution of all would be to eliminate routine ROM accesses entirely, either by copying ROM to RAM upon powerup or by initializing RAM with a microcontroller or by other means. These topics are discussed elsewhere on the forum. But circumstances and preferences vary, and some will find the ROM + WSG combination appealing.
In the context of the ROM + WSG combination, I/O is easy to deal with because VDA=1 is the only condition that indicates I/O will need a Wait State. But ROM is less easy because unlike I/O it may contain both code and data; thus VDA=1 and VPA=1 are both conditions that indicate RAM needs a WS. In other words, bringing ROM into the Wait State picture is what brings VPA into the picture.
The prop delay from bringing VPA into the WS picture results in a tradeoff. Skipping unneeded wait states is definitely good news for ROM operations, but due to added prop delay the system as a whole (RAM, ROM, I/O) will have a somewhat lower ceiling on clock speed. The question arises, does a lot of your activity depend on ROM? Stack and Z-pg will always be a pretty big slice of the pie.
And how much will the clock ceiling be lowered? Quite bit, if the glue logic is build with discrete chips. A CPLD will shrink this impact, although I'd hesitate to call it vanishingly small -- we'd need to quantify that by determining the delay resulting from an extra pass internally through the CPLD's logic array. This'll be less than but on the same order as the device's pin-to-pin delay.
Speaking of pins, another cost to be justified is the CPLD pin used to input VPA to the device. I'm not saying it'll never be worth it -- only pointing out that ignoring VPA (and suffering unneeded ROM wait states) may well open the door to a better use for that pin. Or, perhaps the reduced pin requirement will allow the user to downsize from a big CPLD to a small one, or switch to some other type of PLD entirely.
At the risk of becoming (even more) tiresome,
I'll set aside the WSG topic and return briefly to the issue of dead cycle protection. And the pivotal question is, does memory needs protection from dead cycles? If the answer is no then our only concern is I/O. And the Truth Table and diagram I posted plainly show how I/O can be protected using VDA alone.
-- Jeff
To be clear: if memory doesn't need protection from dead cycles then VPA is useless for that purpose (although VPA does have other purposes).
Quote:
Something you may be overlooking is that a spurious ROM or I/O chip select may be inadvertently associated with an unwanted wait-state.
If someone uses a Wait State Generator with ROM it'll be to reconcile the use of the (slow) ROM with the goal of higher clock speeds and better overall performance. Of course the fastest solution of all would be to eliminate routine ROM accesses entirely, either by copying ROM to RAM upon powerup or by initializing RAM with a microcontroller or by other means. These topics are discussed elsewhere on the forum. But circumstances and preferences vary, and some will find the ROM + WSG combination appealing.
In the context of the ROM + WSG combination, I/O is easy to deal with because VDA=1 is the only condition that indicates I/O will need a Wait State. But ROM is less easy because unlike I/O it may contain both code and data; thus VDA=1 and VPA=1 are both conditions that indicate RAM needs a WS. In other words, bringing ROM into the Wait State picture is what brings VPA into the picture.
The prop delay from bringing VPA into the WS picture results in a tradeoff. Skipping unneeded wait states is definitely good news for ROM operations, but due to added prop delay the system as a whole (RAM, ROM, I/O) will have a somewhat lower ceiling on clock speed. The question arises, does a lot of your activity depend on ROM? Stack and Z-pg will always be a pretty big slice of the pie.
And how much will the clock ceiling be lowered? Quite bit, if the glue logic is build with discrete chips. A CPLD will shrink this impact, although I'd hesitate to call it vanishingly small -- we'd need to quantify that by determining the delay resulting from an extra pass internally through the CPLD's logic array. This'll be less than but on the same order as the device's pin-to-pin delay.
Speaking of pins, another cost to be justified is the CPLD pin used to input VPA to the device. I'm not saying it'll never be worth it -- only pointing out that ignoring VPA (and suffering unneeded ROM wait states) may well open the door to a better use for that pin. Or, perhaps the reduced pin requirement will allow the user to downsize from a big CPLD to a small one, or switch to some other type of PLD entirely.
At the risk of becoming (even more) tiresome,
-- Jeff
Last edited by Dr Jefyll on Mon Jul 19, 2021 4:10 pm, edited 2 times in total.
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: W65C816 overclock experiment
I'm done for the night. This is where I am tonight. The wiring is finished. I have powered it up without the 6502 and able to program the CPLD. I had a working serial port design that I successfully ported to this CPLD. I'm able to get the serial transmitter to spit out data continuously by periodically jamming a byte into the serial transmit buffer. Tomorrow I'll write a 6502 ROM program for the CPLD and try to boot up a real 6502.
Bill
Bill
Re: W65C816 overclock experiment
( Just an FYI for anyone following the somewhat OT discussion about VDA/VPA. This morning over coffee I twigged to the nature of an oversight in my previous post. Rather than creating a lengthy followup post I opted for a pretty extensive edit to the original. )
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: W65C816 overclock experiment
Dr Jefyll wrote:
If you can convince me that memory needs protection from dummy cycles then that would be a game changer. Maybe such a memory exists and I just haven't heard about it yet.
Slow memory might have a problem with the effects of a momentarily "bad" address on the bus that causes selection of one cell, followed in very rapid succession by the selection of a different cell (the one that is actually wanted). The timing diagrams for the RAM I've been using don't shed any light on this possibility, but since I fully qualify all chip selects, I wouldn't know if the potential exists for such a problem.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: W65C816 overclock experiment
plasmo wrote:
Recent discussions about W65C816 have motivated me to look into it and see whether it can run reliably to VGA pixel clock of 25.275MHz.
viewtopic.php?p=50721#p50721
So just a hare faster seems possible.
65C815 accelerators for the C64 and Apple IIGS seemed to top out at around 18-20mhz, but that was likely due to limitations of the CLPDs/GALs used to manage the bus. Both required relatively sophisticated rules to maintain software compatibility, for memory mapping, and to work with the much slower internals of the original machines.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: W65C816 overclock experiment
rpiguy2 wrote:
65C815 accelerators for the C64 and Apple IIGS seemed to top out at around 18-20mhz
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: W65C816 overclock experiment
I think I'd heard that CMD did need to hand-pick the '816s they used.
And this is part of the very common confusion, between specified capability on the one hand, and individual device performance on the other hand. All new 816s will run at 14MHz, at high temperature and 95% (or so) voltage. But some might not! And so a hobbyist can use one set of tactics, while a business needs to use a different set of tactics.
CMDs approach is consistent with them making a low volume luxury product, where they can afford to overclock.
And this is part of the very common confusion, between specified capability on the one hand, and individual device performance on the other hand. All new 816s will run at 14MHz, at high temperature and 95% (or so) voltage. But some might not! And so a hobbyist can use one set of tactics, while a business needs to use a different set of tactics.
CMDs approach is consistent with them making a low volume luxury product, where they can afford to overclock.