6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 1:11 pm

All times are UTC




Post new topic Reply to topic  [ 93 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next
Author Message
PostPosted: Fri Jul 03, 2020 5:43 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
A question about clocking, Juergen: does the 65F02 use the host's own clock, when making host-speed accesses? Or does it drive its clock output(s) autonomously? The reason I ask is that some hosts - the BBC Micro included, also Acorn's Electron - do their own clock-stretching, such that the clock coming into the CPU isn't a fixed frequency.

Another thing to deal with, in full generality, is the RDY input, which some systems use.

And finally, a rather unexpected one, again for full generality: some hosts might need to see a steady stream of SYNC cycles, with gaps between each. I think Acorn's second processor is like this, using SYNC to count out the refreshing of DRAM. (Although, in this case, it might not matter that DRAM remains unrefreshed, if it is never needed...)

All that said, I think it's worth thinking of any given project as having a scope. If this project's scope is to act as a relatively general accelerator for relatively straightforward 6502 systems, it doesn't matter so much that there are systems which would need complex accommodations. Another project might aim to replicate and extend a specific host, and do a better job with that host but not have generality. (The Acorn scene has a couple of those projects in progress right now.) The beeb816 project aims to do a decent job of in-socket acceleration with the restriction of using an actual CPU chip.

If the end result is something which accelerates any one of many chess machines, and also any Apple II system, and any simple single-board 6502 system, but not any Acorn or C64 system, that's still a great result.


Last edited by BigEd on Fri Jul 03, 2020 8:15 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 5:46 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Oh, another thing, which you might be aware of: it's possible (with Xilinx parts) to program a bunch of alternative designs into EEPROM, and boot into a chooser design, which reads the DIPs and then reinitialises. This way, you don't need to put all the machinery into a single design.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 7:18 am 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
Thank you, Ed -- you fast-tracked to several important points which took me weeks to realize...

The 65F02 listens to the external Phi0 clock, echoes it on Phi1 (inverted) and Phi2, and drives the external bus based on that clock. That does imply some jitter, since Phi0 is sampled at the internal 100 MHz clock rate, and also implies the need to sync any input signals to the FPGA via two flip-flop stages, to avoid meta-stable states. But since the 100 MHz period is a small fraction of the host's clock, this has not caused any problems in our tests so far. -- The Apple II also uses an irregular Phi0 clock, by the way, throwing in a longer cycle every 64 ticks or so.

The 65F02 does respect the RDY signal, which is in fact used in several chess computers of interest. (The Mephisto Modular series, which has a slowish module/expansion bus.) It also outputs a SYNC signal -- that took a bit of tweaking in the core, to provide the SYNC one clock cycle earlier. Arlet's core sets SYNC during the cycle when it is dealing with the instruction, but to drive the external bus, I already need it when the address of the instruction is presented.

I have not used multiple FPGA configurations in the flash so far, but intend to do so for the upcoming USB programming option. Good point there -- if the address decoder gets too unwieldy, it does become a bottleneck in the system timing, since it needs to respond in every CPU cycle. So if I go to town with, say, the Apple II disk routine exclusions in the memory map, doing that in a separate FPGA configuration might help.

Finally, I like your point about a defined scope for the project. "Plain", static memory maps with up to 64k of addressable memory and peripherals are the main target here, with the possibility to stretch that playing field slightly to include simple bank-switching schemes. Anything beyond that, e.g. hosts with anything resembling a MMU, should probably remain out of scope. They would require awkward (and highly host-specific) schemes, with a lot of host functionality duplicated inside the CPU just to keep track of things.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 1:55 pm 
Offline

Joined: Fri Apr 06, 2018 4:20 pm
Posts: 94
65f02 wrote:
There certainly are limitations to where this accelerator can be used in a meaningful way. There's a reason why I put "universal" in quotes in the original post... :wink:

@rpiguy2: I would probably argue that accelerating the C64 makes no sense anyway, because who wants to play accelerated games? 8)



Considering how well the Ultimate64 and Chameleon64 sell I think there is definitely a market for it, hobbyists like to get online with their 64 and run GEOS, etc. But, I do not know that the market needs yet another solution.

I was just saying with the spare space on the die you probably could implement the logic for the C64 PAL or similar.

An even better use of the spare space on the FPGA might be to build in a simple display controller. Extend your breakout board to 44 or 48 pins and add a video out.

One of the more discussed topics on this forum are options for displaying output (beyond using a serial terminal).

The RAM constraints may make this impractical for a true VGA display, but you could for example, squeeze a 640x200 screen into 16K and then use the FPGA to double each scan line to output something VGA compatible.

But even as a drop-in replacement for chess computer, or a tube add-on to the BBC this is still an exceptionally cool project. Great work!

If you are interested in mass producing the boards you might want to reach out to the Upduino guys on Tindie.

They sell a similar product using the Lattice ICE FPGA, although the pinout is not defined to be plug in compatible with any processor.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 6:42 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
BTW Juergen, this is a very attractive form factor! Previously we've seen various custom boards which are always significantly bigger than a 40 pin DIL. We used to use the GODIL commercial board, but that's no longer available in the ideal variant, and only has a Spartan 3 FPGA. More recently there's been a quite handy LX9 board by 'eepizza' on ebay, in a DE0 Nano form factor, but that has vanished since the pandemic. There are also clones of a "Mojo v3 FPGA" board, in a similar but slightly larger form factor (and an incompatible pinout) which I haven't seen used yet in a retrocomputing project.

What Dave's done in the past, with the eepizza board, is make up a level-shifting adapter to a 40 pin header, in one of several flavours so it fits in with z80, 6502, 6809 (I believe) - before that, the GODIL approach is to have passive level shifting on all pins (which worked well in most cases) and jumpers to set the various positions of power pins. Once you have directional level shifters, it starts to matter which pins each one serves.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 8:19 pm 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
rpiguy2 wrote:
An even better use of the spare space on the FPGA might be to build in a simple display controller. Extend your breakout board to 44 or 48 pins and add a video out.

I am not quite sure how you envision that. Into which socket would the 44 or 48 pin chip go? Wouldn't the new display controller need to be fully compatible with the original VIC to have any software support?

Building a clone of the C64 VIC inside the same FPGA as the CPU, and routing the relevant bus accesses to the on-chip controller instead of the VIC on the main board, could probably work. In which case I would think retaining the 40-pin CPU pinout and having a separate VGA or HDMI output would be preferable?

Quote:
The RAM constraints may make this impractical for a true VGA display, but you could for example, squeeze a 640x200 screen into 16K and then use the FPGA to double each scan line to output something VGA compatible.

Again, for software compatibility, shouldn't the underlying resolution be the same as in the original C64? In which case the conversion to VGA or HDMI would be "just" an upscaler in any case.

You can get a bit more fancy than to just repeat the scan line (interpolate horizontal resolution too, use 2D filter kernels, add scan lines), and the FPGA could still do that on the fly, just accessing the low-res image in internal RAM, and maybe a line buffer or two. That dual port memory must be good for something... So only very little additional memory would be needed, no matter how large the resolution of the output image.

But that's another project... I hope to publish my board design and VHDL code (plus Verilog core) soon on http://www.e-basteln.de. Feel free to build on it; there's certainly a lot of room left in the FPGA!


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 8:32 pm 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
BigEd wrote:
BTW Juergen, this is a very attractive form factor! Previously we've seen various custom boards which are always significantly bigger than a 40 pin DIL. We used to use the GODIL commercial board, but that's no longer available in the ideal variant, and only has a Spartan 3 FPGA.

I am aware of the GODIL board(s), and like the flexibility very much. Hadn't come across the eepizza board and the separate adapters for different pinouts. That is a neat solution, since it keeps the original footprint and just adds some height!

For the 65F02, staying within the original form factor of the CPU (including its height; space is tight in some of those chess computers!) was a design goal from the start. That does obviously limit flexibility; the PCB supports the 6502 pinout only. Which may catch up on me, since I have just received the suggestion to do a similar board for the CDP 1802/1806, which of course has a very different pinout...

Speaking of the height of the unit: Does anyone have recommendations for affordable individual solder pins with 0.5 to 0.6 mm diameter? A surprisingly difficult problem to solve. Turned brass pins are somewhat fragile, the choice is very limited if you don't want to buy 25000 of them, and they become three times as expensive as soon as the manufacturer does not mold them into a plastic strip... I have also tried the "Flip-Pins" (http://oshchip.org/products/Flip-Pins_Product), which are pleasant to install, but leave the chip sitting a bit proud of its socket. Any ideas appreciated!


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 11:34 pm 
Offline

Joined: Fri Apr 06, 2018 4:20 pm
Posts: 94
65f02 wrote:
rpiguy2 wrote:
An even better use of the spare space on the FPGA might be to build in a simple display controller. Extend your breakout board to 44 or 48 pins and add a video out.

I am not quite sure how you envision that. Into which socket would the 44 or 48 pin chip go? Wouldn't the new display controller need to be fully compatible with the original VIC to have any software support?

Building a clone of the C64 VIC inside the same FPGA as the CPU, and routing the relevant bus accesses to the on-chip controller instead of the VIC on the main board, could probably work. In which case I would think retaining the 40-pin CPU pinout and having a separate VGA or HDMI output would be preferable?

Quote:
The RAM constraints may make this impractical for a true VGA display, but you could for example, squeeze a 640x200 screen into 16K and then use the FPGA to double each scan line to output something VGA compatible.

Again, for software compatibility, shouldn't the underlying resolution be the same as in the original C64? In which case the conversion to VGA or HDMI would be "just" an upscaler in any case.

You can get a bit more fancy than to just repeat the scan line (interpolate horizontal resolution too, use 2D filter kernels, add scan lines), and the FPGA could still do that on the fly, just accessing the low-res image in internal RAM, and maybe a line buffer or two. That dual port memory must be good for something... So only very little additional memory would be needed, no matter how large the resolution of the output image.

But that's another project... I hope to publish my board design and VHDL code (plus Verilog core) soon on http://www.e-basteln.de. Feel free to build on it; there's certainly a lot of room left in the FPGA!


I wasn’t implying the 44 or 48 pin version would be C64 compatible. It would just be cool to have a 6502 variant with easy video output :-)


Top
 Profile  
Reply with quote  
PostPosted: Fri Jul 03, 2020 11:41 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 9:02 pm
Posts: 1748
Location: Sacramento, CA
65f02 wrote:
Speaking of the height of the unit: Does anyone have recommendations for affordable individual solder pins with 0.5 to 0.6 mm diameter? A surprisingly difficult problem to solve. Turned brass pins are somewhat fragile, the choice is very limited if you don't want to buy 25000 of them, and they become three times as expensive as soon as the manufacturer does not mold them into a plastic strip... I have also tried the "Flip-Pins" (http://oshchip.org/products/Flip-Pins_Product), which are pleasant to install, but leave the chip sitting a bit proud of its socket. Any ideas appreciated!


I found these many years back. Not sure if they are the same as the turned brass ones you mentioned. They are 0.46 mm diameter.

https://www.mouser.com/ProductDetail/Mi ... HJS%2Fo%3D

Daryl

_________________
Please visit my website -> https://sbc.rictor.org/


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 04, 2020 3:39 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
65f02 wrote:
For the Apple, switching in the RAM is mostly a one-way street to my knowledge; it never goes back to the ROM after e.g. you boot up UCSD Pascal.

Going back and forth is quite common. For example, Integer BASIC (i.e., the old II-not-plus ROM image) is often loaded into the language card and DOS provides an `INT` command to switch to that image in RAM, and an `FP` command to switch back to the ROM image. IIRC, it will switch automatically to the correct one when you `RUN` a program from diskette, depending on whether it's an Integer BASIC or Applesoft program.

It's also not just a single 16K bank: the bank switched area is the 12K of motherboard ROM, $D000-$FFFF, so they swap between ROM and one bank of RAM for the top 8K $E000-$FFFF, but the bottom 4K $D000-$DFFF can be mapped to ROM or either of two 4K RAM banks, so you can use the full 16K of the language card. I would not be surprised at all if large systems like Apple Pascal used both the 4K RAM banks at $D000-$DFFF, switching between them frequently.

The IIc gets even more complex. It started with the usual ROM plus two 64 KB banks of RAM, "main" and "auxiliary." This gives you essentially two language cards, with ROM and two RAM banks at $E000-$FFFF and ROM and four RAM banks at $D000-$DFFF. (The attached diagram from the Apple IIc Technical Reference Manual may help to make this arrangement more clear.)

But then it gets even better. The above version of the ROM was the original "255" version which was only 16K; the next version, "0" was 32K and now also bank-switches the ROM (now at $C100-$FFFF in the IIc) when switching between main and auxiliary banks. This happens very frequently during normal operation.

I present all this just as a data point for the kinds of things you'll need to deal with for certain systems; I hope the information can help you in your design.

Quote:
The Spartan-6 maxes out at 64 kByte RAM on chip (in the 225-pad BGA I am using, the largest I can fit onto the PCB). I have been tempted by its Spartan-7 cousin, which is available in the same package with up to 180 kByte of RAM.

Given that the IIc already has a total of 160 KB of memory without further RAM expansion, and that this sort of design and memory capacity is not unusual for mid-80s 8-bit systems, it seems to me that even the Spartan-7 is likely to have to little capacity for the entire RAM and ROM of some hardware. I guess you need to give some serious consideration to some sort of LRU caching system if you want really wide compatibility.

Quote:
I do intend to deal with the Apple II disk controller though. No special hardware features there, just heavy (and clever) use of exactly timed code, which needs to be executed at the original speed. I have my "Beneath Apple DOS" book ready to define the required address ranges for the 65F02 memory map.

You may also find Jim Sather's Understanding the Apple II to be handy for this work. The chapter on the floppy controller is the best explanation around, though it may go a bit beyond what you need, and the book is full of good information on other stuff as well.

Quote:
@rpiguy2: I would probably argue that accelerating the C64 makes no sense anyway, because who wants to play accelerated games? 8)

Well, and also, isn't the tradition with that line of computers when you add another CPU to the system to make sure it's run slower, in fact slower than that CPU as used in systems from a half decade earlier? :-)


Attachments:
a2c-trm-p21-fig2-2.png
a2c-trm-p21-fig2-2.png [ 92.4 KiB | Viewed 2440 times ]

_________________
Curt J. Sampson - github.com/0cjs
Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 04, 2020 7:20 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hmm, sounds to me like the IIc, like the BBC Master, would be well out of scope. Not only is there a lot to do, but it's a relatively small audience. (And in the case of the Master, the CPU is soldered in, so it's also a more invasive upgrade.)

It might be interesting to see how much benefit remains if the device has a cache, but has to invalidate it when the memory map changes. But that's even more work.

As noted upthread, using this device to upgrade an Acorn-world Second Processor (or Turbo board) would be relatively simple and very effective.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 04, 2020 7:29 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
In the case of the BBC Master, there are three things which control which parts of the 320KB total address space are mapped to CPU address space:

1: The value of the ROMSEL latch at $FE30. The low four bits select the "sideways" slot mapped to the $8000-BFFF window (four of these are half of the stock RAM fitted, a further seven are sections of the stock ROM), and the high bit overrides a 4KB segment of the Shadow RAM onto $8000-8FFF for use by the MOS.

2: The value of the ACCCON latch at $FE34. Three bits of this select respectively:
  1. Whether VDU driver accesses (see below) to screen memory (defined as the 20KB in $3000-7FFF) go to main or shadow RAM;
  2. Whether non-VDU-driver accesses to screen memory go to main or shadow RAM;
  3. Whether the remaining 8KB segment of Shadow RAM is mapped over the bottom end of the MOS ROM at $C000-DFFF. This is normally used as temporary storage by the floppy disk filesystem.

3: Whether the most recent SYNC cycle was in the $C000-DFFF range or not. If so, the access is assumed to be from the VDU drivers for the purpose of $3000-7FFF mapping. If not, the access will be treated as a non-VDU-driver mapping.

And yes, this means you can access shadow display memory from code located in the 8KB shadow RAM block, effectively overriding the stock VDU drivers, provided you have finished with the floppy disk system. Probably few people do that, but for maximum compatibility you should expect it to occur.

I/O space proper consists entirely of the FRED, JIM and SHEILA pages ($FCxx, $FDxx, $FExx). FRED and JIM both map to the 1MHz Expansion Port; SHEILA is where all internal hardware is mapped, including the memory mapping latches.

To be compatible with the above, you will need to track the state of $FE3x writes (which I assume are incompletely decoded, so blocks of four addresses map to the same latch) and the VDU driver flag. This is nine bits of relevant state in all, so can theoretically be kept in a single Spartan "byte". Writes to the display memory area (whether shadow or main) must be write-through cached so that the CRTC will fetch the updated values; the rest of main and shadow RAM can be kept internally for speed. When the code jumps into or out of the VDU driver region, at least one SYNC cycle must be generated to the new region (in-order relative to pending writes) so as to update the external hardware.

The above will correctly handle all 64KB of main and shadow RAM, the 16KB MOS ROM area, and hardware I/O. With the Spartan-7, you might as well assign a permanent 80KB mapping of internal RAM to those, allowing full-speed reads once the MOS ROM has been read in; this is particularly important for zero page and will also greatly accelerate graphics operations, even with write-through. Then we can consider how to handle the sixteen-way, 16KB window of Sideways RAM and ROM slots. Collectively, these are 256KB and thus too large to permanently map to internal RAM - but usually only a subset of this total area is used during a particular session.

Because a Sideways slot might contain either RAM or ROM, a naive write-through or write-back caching scheme might not be appropriate. Some of the slots may map to the cartridge sockets which are accessible to the user, and can be fitted with arbitrary memory-mapped hardware. However, you might assume that the machine has not been extensively modified in this respect, and thus make the following assumptions:

1: Slots 9-F always map to the stock ROM. Writes can therefore be discarded, and reads can be LRU-cached. You might consider extending this assumption to slot 8, which maps to a physical ROM socket.

2: Slots 4-7 always map to Sideways RAM. Write-back caching can therefore be employed. Or, if this is too complicated, assign 64KB of internal RAM permanently to these slots.

3: Slots 0-3 (and optionally also slot 8), which map to cartridge sockets, are probably best handled as uncached accesses. This accommodates the full range of hardware which could be plugged into them, including non-memory devices and devices which include their own internal mapping systems.

With 80KB assigned to main and shadow RAM and the MOS ROM, plus 64KB assigned to the Sideways RAM slots, about 36KB of internal RAM remains for caching the stock ROM slots. That should be enough for full-speed handling of BASIC, DFS, and the extended graphics routines which are scattered more-or-less randomly through spare areas of other ROM slots.

Of course, implementing the Second Processor is much easier. There's no memory mapping, just a small I/O window and a tiny boot ROM.


Last edited by Chromatix on Sat Jul 04, 2020 8:09 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 04, 2020 7:33 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
As there's only 64k of block RAM on the device in question, any machine with more memory than that is going to need some tradeoffs (as well as some finesse) - which might well, perfectly reasonably, put it out of scope.

A different project, rather probably exceeding the 40-DIL footprint, could have an off-chip on-board RAM, but it would have to be a fast RAM, or once again we're in the realm of caching.

It's kind of fun to discuss, but feels rather off-topic for the thread. Perhaps sketch a design and make a new thread.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 04, 2020 4:16 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
BigEd wrote:
As there's only 64k of block RAM on the device in question, any machine with more memory than that is going to need some tradeoffs (as well as some finesse) - which might well, perfectly reasonably, put it out of scope.

I'm not quite clear what you're getting at here. Are you saying that supporting more than 64K of addressable memory should be discussed, as we have been doing, to see what would need to be done to do that? (That would of course necessitate discussing the details of how addressing works in such machines, as we have also been doing.) If so, yes, it might be out of scope, but I don't really see how that changes anything since we need to continue discussion exactly as we are right now to determine what is and isn't out of scope.

Or are you saying you feel it's not worth the time to investigate this and that we should simply tell the OP, "the Apple II with a language card has more than 64 KB of addressable memory, you should drop the idea of supporting it"? That's also going to knock out the Commodore 64, by the way, and almost any machine configuration that supported 64 KB of RAM, since not many machines out there had no ROM. Personally, I feel that it would be rather limiting to drop most popular home computers from the early 80s onward, and maybe that's not a decision we want to make right now.

And part of the reason for delaying that decision is that I think we don't yet have a good handle on what particular characteristics are complex to support what what particular characteristics are not. For example, the the 128K Apple IIc, with up to four banks in some areas of the address space, might well be simpler to support than the C64, with no more than two banks in any area of address space, simply because any bank switches on the IIc can probably be calculated from nothing but program analysis, whereas on the C64 banks can be and were switched by external hardware, independent of anything the CPU knows about. (The Epyx Fast Load cartridge is an example of this.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 04, 2020 4:19 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I'm just noting the thread is about an FPGA on a 40 pin carrier, which has 64k available to it, so it's a great fit for certain systems. The more we're inclined to muse about how to deal with other systems, not so fitting to this particular project, the more it feels to me like a new thread would do for that.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 93 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: