6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 2:30 pm

All times are UTC




Post new topic Reply to topic  [ 93 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Sat Jul 04, 2020 5:25 pm 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
@cjs: Thank you for the details about usage of the extended memory in the Apple II. I guess I have only ever used my language card for UCSD Pascal, which may be a special case. To support switching back and forth between RAM and ROM, I guess my best bet, in view of the limited 64k block RAM I have in the FPGA, is a compromise: Upon startup, read in (and accelerate) the ROM; allow it to be overwritten and used as RAM once the language card gets enabled; and then revert to un-accelerated mode in that address space whenever the ROM gets switched back in.

For the more advanced Apple //e and //c, I am firmly with Ed: That's out of scope for me, just like the BBC Master. Too much hassle and an awkward solution; and the result would be less than convincing since I could only accelerate a fraction of the available host memory.

Besides technical reasons, personal preferences and nostalgia come into play too, to be honest. The Apple II plus was my first computer -- I still have it, and did my initial tests of the 65F02 on that very machine. And UCSD Pascal was dear to my heart as my first "real" programming language, so I would like to support it. But by the time the Apple //e came out, I had moved on to 68000 systems, so that generation of machines does not mean as much to me.

Supporting a much larger, off-chip RAM would be neat, but I don't think I can accommodate it in the 40-pin DIP footprint. Even if I find room for the chip, bringing out approx. 30 additional address/data/control lines from the FPGA seems impossible:

The design rules of the affordable PCB houses (like Aisler, OSH Park, and the various Chinese fabs) allow for one trace between BGA pads, or one via in the middle of a square of pads, for this 0.8mm pitch BGA. With the narrow PCB, one can essentially only bring out traces towards the short sides of the PCB, and those are pretty crowded already. There's limited room for more vias to other layers, since the decoupling caps block the bottom layer below the FPGA.

I'm attaching a view of the top and bottom layer (without the pours) to give a better idea. Blind vias seem like the only way to bring out significantly more traces, and none of the cheap PCB places support those.


Attachments:
65F02 Rev D top+bottom.png
65F02 Rev D top+bottom.png [ 58.65 KiB | Viewed 2302 times ]
Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 05, 2020 2:06 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
65f02 wrote:
For the more advanced Apple //e and //c, I am firmly with Ed: That's out of scope for me, just like the BBC Master. Too much hassle and an awkward solution; and the result would be less than convincing since I could only accelerate a fraction of the available host memory.

Well, I think that would be giving up great gains too soon. I'm sure that there are plenty of people like me who use a IIc mainly to run II+ software, and in such cases accelerating less than 50% of host memory would give you virtually all of the speedup. It would indeed be less convincing to people who can detect a difference in overall program run speed of a few percent (or in the case of programs using the host ROM very little, such as chess games, a fraction of a percent), but I doubt that many of your users are even set up to do such comparisons between a II+ and a IIe or IIc.

Even the IIc's original 16K v255 ROM uses a few locations in the auxilliary address space (such as the "screen holes" in the frame buffers allocated to data storage for I/O devices), so if you don't support this the system probably won't even boot, and certainly some devices (such as the serial ports) won't work. But many of these are little used, or even not used at all after setup if you're not using some devices. With the later 32K ROMs there is code in the alternate ROM bank that is used in normal operation, but not a huge amount of it. I don't have a figure, but my guess would be that in these cases over 90% of ROM accesses are from the main bank, not the alternate bank.

If you commit to supporting the language card at all, you've committed to emulating memory management logic in the 65F02. And come to think of it, if you commit to supporting an Apple II with any cards beyond a single disk controller in slot 6 (or any other single card) you still have to commit to emulating memory management logic anyway, because the $C800-$CFFF space is bank switched between the ROM on each card. (PR#6 switches in the ROM on slot six before jumping to it.)

Especially if you're going to be doing this for any other systems, it seems worthwhile to develop a "mini-language" to describe such memory management logic, which would need to describe:
  • Which ranges of the address space are banked, and what's backing each bank.
  • How to deal with each bank: internal RAM only, external access only, write-through on free external bus cycles (if you want to implement that), and perhaps in the future some sort of LRU cache or something like that.
  • Which addresses and what kind of access to them (read, write, two sequential writes, perhaps other patterns) change the bank currently in use in a given area of address space.

Note that in this model of 65F02 memory management the RAM and ROM at the $E000-$FFFF range actually are viewed as four mappings: read/write ROM, read ROM/write RAM, read RAM/write ROM ("read only RAM") and read RAM/write RAM. (The $D000-$DFFF bank will be the same, except with further mappings because there are two RAM banks.) Not supporting all of these will almost certainly break a lot of software (including, probably, UCSD Pascal) because it's common use the read ROM/write RAM mapping during setup so you can call ROM routines while loading the language card.

Even for just the Apple II language card it seems to me simpler to develop and debug a simple, general system like this than to try to special-case the whole thing, and once you have this it's easy to add support for a lot of other systems, too, including the IIc, which is not inherently more complex but just has a few more mappings.

The one thing this does not yet get you is support for systems like the C64 where external signals contribute to determining memory mapping. A simple way to handle that woudl be to have a few connections on the board to serve as inputs to the memory management logic; C64 users could then wire up the /GAME and /EXROM signals (provided by whatever's plugged into the cartridge port) to this and the memory management logic could examine the state of those as well as its internal state from past instructions executed to determine the current mapping for any address range.


Quote:
I guess I have only ever used my language card for UCSD Pascal, which may be a special case. To support switching back and forth between RAM and ROM, I guess my best bet, in view of the limited 64k block RAM I have in the FPGA, is a compromise: Upon startup, read in (and accelerate) the ROM; allow it to be overwritten and used as RAM once the language card gets enabled; and then revert to un-accelerated mode in that address space whenever the ROM gets switched back in.

Yes, simply saying "I'm going to allow only a fixed 64K of memory to be 'fast memory'" (i.e., read/writes use the 65F02 onboard memory) is I think an excellent simplification to start with, and does not get in the way of later adding schemes to more dynamically switch back and forth. However, for the Apple II I might start by providing both "ROM is fast" and "language card is fast" schemes (perhaps switched by your DIP switch on the board) because for users on a II+ not running an alternate language (Integer BASIC or UCSD Pascal) that would greatly slow down the system.

Quote:
Supporting a much larger, off-chip RAM would be neat, but I don't think I can accommodate it in the 40-pin DIP footprint. Even if I find room for the chip, bringing out approx. 30 additional address/data/control lines from the FPGA seems impossible....

I agree that's probably a non-starter. But it seems to me it doesn't really matter how much RAM you add to your system; there will always be something out there where you don't have enough on-board RAM to replace all of the system memory. So you might as well just deal with that by declaring you're not always going to map all system memory into on-board RAM, leaving yourself in a place where you can handle larger memories with simple solutions ("always access as I/O space") for the moment and then upgrade those to more complex solutions within the same structure later, if that's worthwhile. (For some systems, such as the IIc, I don't think it would be.)

Quote:
Besides technical reasons, personal preferences and nostalgia come into play too, to be honest. The Apple II plus was my first computer -- I still have it, and did my initial tests of the 65F02 on that very machine. And UCSD Pascal was dear to my heart as my first "real" programming language, so I would like to support it. But by the time the Apple //e came out, I had moved on to 68000 systems, so that generation of machines does not mean as much to me.

I am similar: I have little interest in Apple II systems after the II+. However, I now own a IIc, not a II+, simply because it gets me real Apple II hardware in much less space than a II+ system would and it works just fine as a II+ for all the purposes for which I need it. (Well, almost all, but what little it doesn't do I can live with; it's better than giving up one of my other computers in order to make space for a II+.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 05, 2020 6:58 am 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
cjs wrote:
And come to think of it, if you commit to supporting an Apple II with any cards beyond a single disk controller in slot 6 (or any other single card) you still have to commit to emulating memory management logic anyway, because the $C800-$CFFF space is bank switched between the ROM on each card. (PR#6 switches in the ROM on slot six before jumping to it.)

Well, my "solution" for that particular problem is embarrassingly simple: I always map the complete $C000..$CFFF space as external, i.e. I let the 65F02 access the actual host I/O and the ROMs on the I/O cards. I figure that that code is likely to contain some cycle-counted, timing critical sequences; and for most cards it deals with I/O operations where the throughput will be limited by the peripheral device anyway.

Quote:
Especially if you're going to be doing this for any other systems, it seems worthwhile to develop a "mini-language" to describe such memory management logic, which would need to describe:
  • Which ranges of the address space are banked, and what's backing each bank.
  • How to deal with each bank: internal RAM only, external access only, write-through on free external bus cycles (if you want to implement that), and perhaps in the future some sort of LRU cache or something like that.
  • Which addresses and what kind of access to them (read, write, two sequential writes, perhaps other patterns) change the bank currently in use in a given area of address space.

That would be a nice and flexible solution. The challenge is that the address decoding also needs to execute quickly on the FPGA: In every 100 MHz clock cycle, it needs to determine whether the newly presented address on the internal bus needs to be mapped externally -- i.e. the CPU halted and an external bus cycle initiated -- and if the answer is "no", it must not slow down the internal execution.

This has already proven to be a sensitive area of the FPGA timing as I kept adding (simple) memory maps for more chess computers. On the other hand, if a generic solution is feasible which gets configured at startup (when the FPGA determines its host type), rather than having all the different supported decoders sitting side by side in the logic fabric, that may simplify the decoder somewhat. I'll give that some thought!

Quote:
Note that in this model of 65F02 memory management the RAM and ROM at the $E000-$FFFF range actually are viewed as four mappings: read/write ROM, read ROM/write RAM, read RAM/write ROM ("read only RAM") and read RAM/write RAM. (The $D000-$DFFF bank will be the same, except with further mappings because there are two RAM banks.) Not supporting all of these will almost certainly break a lot of software (including, probably, UCSD Pascal) because it's common use the read ROM/write RAM mapping during setup so you can call ROM routines while loading the language card.

Noted, thanks. I have re-read the language card documentation (after a very long time...) prompted by your posts. Supporting the different read/write access combinations should be straightforward, but of course the logic has to be aware of them. I think I will keep it simple -- as soon as a "write to RAM" mode has ever been enabled, I would direct all future ROM read accesses to the actual external ROM, since I can't be sure I still have an intact copy in the on-board ROM.

Supporting the two separate 4k blocks which can be mapped into the $Dxxx address space is a bit more of a headache: I have the extra 4k, since I never need internal RAM for the $Cxxx I/O area, but need to translate an internal address to a different one when the alternate block is accessed. That adds some complexity to a timing-critical path: Because the block RAM is physically distributed across the whole FPGA chip, getting addresses and data to and from the outer blocks incurs significant routing delays...


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 05, 2020 8:00 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
65f02 wrote:
Well, my "solution" for that particular problem is embarrassingly simple: I always map the complete $C000..$CFFF space as external, i.e. I let the 65F02 access the actual host I/O and the ROMs on the I/O cards.

Oh, right duh! That's actually a great solution for this.

Quote:
I figure that that code is likely to contain some cycle-counted, timing critical sequences; and for most cards it deals with I/O operations where the throughput will be limited by the peripheral device anyway.

Right. And as for the disk controller ROM, it's not even used after boot anyway, so there are no issues there.

But that does bring up another problem, one that seems to me much harder than the memory mapping one: how do you run Apple DOS (or anything that uses the Apple II drive)? The code there is running from regular RAM (and not even from a fixed location), is extremely timing criticial, and you can't just slow down only the I/O accesses because other non-IO instructions that are being executed are also part of the timing loop. At first glance, this seems to be the biggest problem with your CPU working in an Apple, and may well affect other systems, too.

Quote:
This has already proven to be a sensitive area of the FPGA timing as I kept adding (simple) memory maps for more chess computers. On the other hand, if a generic solution is feasible which gets configured at startup (when the FPGA determines its host type), rather than having all the different supported decoders sitting side by side in the logic fabric, that may simplify the decoder somewhat. I'll give that some thought!

Well, I was certainly envisioning parsing the description and then building whatever data structure you need for operating in a particular environment. There's no reason you couldn't compile small programs based on the description, for that matter. For small things like this, compilers aren't too scary. After all, you use a compiler every time you use a regex.

Quote:
Noted, thanks. I have re-read the language card documentation (after a very long time...) prompted by your posts. Supporting the different read/write access combinations should be straightforward, but of course the logic has to be aware of them. I think I will keep it simple -- as soon as a "write to RAM" mode has ever been enabled, I would direct all future ROM read accesses to the actual external ROM, since I can't be sure I still have an intact copy in the on-board ROM.

Sure. The nice thing about having worked out descriptions with proper semantics (and an accurate way of expressing them) is that you can leave all the optimization for later, and possibly never, for some cases.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 05, 2020 8:29 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
> Because the block RAM is physically distributed across the whole FPGA chip, getting addresses and data to and from the outer blocks incurs significant routing delays...

Indeed! You're using a -3 FPGA, I notice, which is faster than the -2 on the boards I'm familiar with, so that will help. Have you needed to do anything to the CPU to speed things up, or to the memory system? I don't suppose you've gone so far as to place parts of the design?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 05, 2020 8:54 pm 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
cjs wrote:
how do you run Apple DOS (or anything that uses the Apple II drive)? The code there is running from regular RAM (and not even from a fixed location), is extremely timing criticial, and you can't just slow down only the I/O accesses because other non-IO instructions that are being executed are also part of the timing loop. At first glance, this seems to be the biggest problem with your CPU working in an Apple, ...

For a given version of DOS, the critical RWTS (read/write track/sector) routine and the motor-stepping routine should be in a fixed location to my knowledge. So I can define the memory areas where code has to be executed from external ROM to preserve the timing. And I can define the areas generously, making most or all of the DOS code execute slowly if needed, since much of it's throughput will be limited by the physical disk performance anyway.

But indeed, this is awkward. Having the 65F02 settings depend on software details of the host system, rather than just its hardware, is a kludge. But unavoidable for systems like the Apple, I'm afraid.

Even for the Apple there will be plenty of software where acceleration does not work (e.g. because the software brings its own disk routines, for copy protection or other reasons), or where it does not make sense (e.g. for action games which should run at their intended speed). I will have to implement a way to switch the acceleration on an off -- in fact it should probably default to "off" upon power-on to retain compatibility. I'll go looking for an unused address which I could grab as an I/O address for a software switch, I think.

Quote:
... and may well affect other systems, too.

We found the Commodore PET 8032 to behave much more "orderly" than the Apple II in that respect. Even the "beep" is timed by the PIA rather than in software! In general, the schematics and firmware suggest that the Commodore was designed "by the book" by a team of engineers who used parts "as intended" -- whereas the Apple II was obviously designed by Steve Wozniak. :) Other computers are probably somewhere in the middle of that spectrum...


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 05, 2020 9:12 pm 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
BigEd wrote:
You're using a -3 FPGA, I notice, which is faster than the -2 on the boards I'm familiar with, so that will help. Have you needed to do anything to the CPU to speed things up, or to the memory system? I don't suppose you've gone so far as to place parts of the design?

The -3 version let us push the clock rate to 100 MHz (which I had hoped to reach, mainly since it's a three-digit number :wink: ), where the -2 would only be good for 75 to 80 MHz.

The 100 MHz clock rate is marginal though: Trivial changes in some non-critical part of the VHDL can throw the place/route optimizer off course sometimes. I have tried to set a few placement constraints, but found them to help only for one fixed version of the design, and become counter-productive when something changes. I might be doing it wrong, or more likely, might just be trying trying to push the optimization a bit too far. The sensible thing would be to step the clock rate back by a few MHz, which would be barely noticeable in practice -- but for now I have managed to keep fiddling with the design to maintain the magical "100"...

I have been wondering whether another level of pipelining (pre-fetching) in the core would make sense. The path for fetching data from RAM, using them in the core to determine the next address, and sending the address back to the RAM is one of the bottlenecks. But I was (a) too lazy, (b) too concerned to break something, and (c) interested in keeping the original timing when operating non-accelerated from external memory; so I have not really looked into modifying the core to do pre-fetching.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 06, 2020 10:16 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
65f02 wrote:
For a given version of DOS, the critical RWTS (read/write track/sector) routine and the motor-stepping routine should be in a fixed location to my knowledge.

That's not quite true. When booted from a "master" diskette, DOS loads below 16K (IIRC) and relocates itself up to higher memory, if available. (This allows it to run on any memory size from 16K through 48K. The Apple II motherboard is expandable from 4K to 48K in 4K increments.)

But merely providing 48K of memory won't fix the issue with "slave" diskettes formatted on systems with less than 48K of RAM because those load DOS in the location it existed on the system that did the format. (One must run a separate MASTER CREATE program to create diskettes that relocate their DOS on boot.)

That said, that's probably less of an issue to most users than all the other software that has its own RWTS routines, so just knowing that this issue exists, you can handle it the same way as those others. (Probably by telling the user, "This kind of stuff won't work." :-P)

Quote:
So I can define the memory areas where code has to be executed from external ROM to preserve the timing.

RWTS is in RAM, but yeah, it is basically a separate part of DOS, so you can just mark that whole block as "external execute." It will cover more than you need, but it's probably not worth more effort than that. You probably do have to remember to ensure that writes done by code in that block to memory outside that block are also slowed, to handle buffer writes that are part of the time-criticial read routines.

Quote:
But indeed, this is awkward. Having the 65F02 settings depend on software details of the host system, rather than just its hardware, is a kludge. But unavoidable for systems like the Apple, I'm afraid.

Yeah, seems so. This is all complex and difficult enough that the most reasonable approach might be to make "high-speed" and "fully compatible speed" modes software-switchable, and just leave it to the user to handle it, exactly as you suggest for other software.

Quote:
I'll go looking for an unused address which I could grab as an I/O address for a software switch, I think.

That's going to be tough since in a fully expanded system (such as the Apple IIc or something configured similarly) there are no truly free addresses that I'm aware of. Some I/O ports are mirrored, and you might be able to find one would produce a safe reaction if your CPU weren't intercepting the access, but safer would probably be to do something like a special instruction sequence that's recognized by your CPU and harmless on a regular CPU. (I'm not saying that that's actually worth the cost of implementing it, since I don't know the cost, just that it would probably be safer.)

Quote:
We found the Commodore PET 8032 to behave much more "orderly" than the Apple II...

Yeah, no real surprise there. The developers of the CBM ROM code clearly had some familiarity with more sophisticated operating systems and modeled their KERNAL (which the developers called a "kernel") along those lines. Reflecting on the Apple II systems' pecadillios when it comes to I/O, I don't think it's unreasonable at all to handle just memory mapping and leave it entirely to the developers and end users to turn on and off "turbo mode" as necessary for handling I/O with timing issues beyond "don't access this address more than once every two cycles of the original clock speed" or whatever.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 06, 2020 10:20 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Maybe, just maybe, it would be workable if the machine drops to normal speed whenever it sees an access to a disk i/o address (I guess there are only a few of those) and then stays in normal speed for some extended period, maybe as much as the time it takes a disk to rotate. Twice. And of course resetting that counter on every such access.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 07, 2020 7:35 pm 
Offline

Joined: Tue Oct 24, 2017 1:08 am
Posts: 10
BigEd wrote:
Maybe, just maybe, it would be workable if the machine drops to normal speed whenever it sees an access to a disk i/o address (I guess there are only a few of those) and then stays in normal speed for some extended period, maybe as much as the time it takes a disk to rotate. Twice. And of course resetting that counter on every such access.


That's how most of the previous Apple ][ accelerators do it such as the Zip Chip, watching for access to those I/O memory addresses and slowing down to the normal clock speed for a period. Whether or not the speed change happened could be selected for each block of slot I/O memory addresses. Some devices such as parallel printer ports which are just driving a latch don't really need a general slowdown. Others like the disk drive do. It's easier to start and stop the slowdown based on access to the hardware and a timer than it is to try to catch every memory block where a software timing loop might be.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 07, 2020 7:59 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Ah, interesting, thanks. Always good to hear about prior art. (I think I recall that one of the Apple II accelerators sat in the 6502 socket, but another one sat in an extension card. I could have that wrong.)


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 07, 2020 8:26 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
BigEd wrote:
Maybe, just maybe, it would be workable if the machine drops to normal speed whenever it sees an access to a disk i/o address (I guess there are only a few of those) and then stays in normal speed for some extended period, maybe as much as the time it takes a disk to rotate. Twice. And of course resetting that counter on every such access.


That would work. Unfortunately, it is poison for compiler and linker performance.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 08, 2020 1:40 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
Well I've just had a thought.

Perhaps we can take advantage of the maladroitness in the 6502 that causes it to use an extra cycle when branches cross page boundaries. Because of this, coders of time-criticial routines often ensure that they don't cross a page boundary (because this is much easier than making sure that the page boundary crossing doesn't cause cause any harm), and this is exactly what DOS does with the RWTS routines.

So one heuristic that might work reasonably often is, whenever an access to a disk I/O address is discovered, immediately mark that whole (256-byte) page as code to be run at standard speed. (This is not quite the same as just marking it external; even accesses to memory outside that page, when they are made by code in that page, must be at standard speed. I guess you could do this by forcing full bus cycles for all the code there, even if sometimes those cycles end up in an internal rather than external memory write.)

Something along these lines should ensure that minimal code is caught in the I/O tarpit, and still let other separated code, such as compilers and linkers, run at high speed, including their access to the very disk buffers that were written at standard speed. And it should work regardless of whose code is doing disk I/O or where it is in memory.

I don't know how easily or efficiently this could be implemented on an FPGA. One major change it seems to introduce is that the "slow page" mappings now need to be dynamic, since they cannot be calculated and loaded at power-up or reset time. A plain old bitmap of which pages are affected would be only 32 bytes of memory for the 256 pages in the system, though.

By the way, that annotated Apple DOS listing I linked to above makes for good reading, if you're interested in that kind of thing, and can handle a fair amount of sarcasm. (I don't think the guy who wrote that had ever tried coding a large program in EDASM or a similar tool on an actual Apple II.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 08, 2020 6:48 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I like it!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 08, 2020 7:27 am 
Offline
User avatar

Joined: Wed Jul 01, 2020 6:15 pm
Posts: 79
Location: Germany
Thanks, cjs -- that is a neat idea. Along the same lines, maybe it would work to fall back into slow mode whenever a disk I/O address is accessed, and stay there until an RTS is encountered?

Also, a good point about needing to access all addresses externally while in slow mode. The "external code" mechanism I currently use is already running into a minor limitation in some of the chess computers under test: The programmer has used a PHA/PLA sequence within a loop to obtain a delay. (Lots of cycles for two bytes of code...) But since the stack is obviously running fast, we don't get quite the intended delay. In the chess computer that only means slightly higher-pitched tones, but a similar effect would obviously not be acceptable in Apple DOS routines.

That will need a new flavor of bus cycles in the replica, where data are fetched from internal RAM (where they may previously have been written by some fast code elsewhere), but nevertheless an external bus cycle is started to retain the timing. Which should, of course, be quite feasible.

I'll look into the implementation in more detail. Had gotten sidetracked over the past days by the suggestion of a (dare I say in a 6502 forum?) CDP 1806 replica... A lot of speed to be gained when replicating that processor, and an interesting task to mimic its external bus behavior, since the original CDP takes 8 clock cycles on the bus for every machine cycle!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 93 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: