BigEd wrote:
Sounds good, very good indeed! By sacrificing efficiency, you've made very respectable gains. I hope you can share more details.
The 'sacrifice' can be logic only if memory remains unduplicated (but with split addressing and some byte twisting logic). Of course I can share the changes to the logic (very few, really) and the microcode (a lot of changes there), but it's not in releasable shape and may not be for some time. I've also had to recreate the microcode sources (which I lost) from the raw bit array, so the microcode generator is a chunk of C code now.
BigEd wrote:
I expect it will be true that some cores will be more amenable to this transformation than others
The differences won't be great. Very little extra work is done, in any core, during opcode argument fetch cycles. And these are the ones being eliminated.
BigEd wrote:
, which will be the reason behind some of the comments last time.
Oh, nonsense. Almost everyone tried to be the sceptical smartass, instead of actually taking in the idea.
BigEd wrote:
I'm interested to know whether your wide fetches are aligned, whether you marshal the 3 bytes you need from an 8-byte buffer which you fill 4 at a time, as previously sketched, or whether you've taken some other approach.
Right now, it's 3 duplicates of main memory. They are written to simultaneously (same addresses) and read from simultaneously (different addresses). That's just to keep things simple, and not pollute the logic with any byte twisting multiplexers. But I could (and I have briefly done so) use a single copy split into seperately addressable banks (which then inevitably requires some conditional address tweaks or byte twisting). The accumulator style I suggested in the earlier discussion has its own problems : on any control flow change, your previously read word becomes invalid, incurring a penalty (and probably a hold cycle), at least in the general case.
BigEd wrote:
Another possibility, more complexity again, is to have a write buffer, so that pending writes don't hold up reads - assuming there are still dead cycles in which the write buffer can empty.
Write buffers have their own problems. Generally, you will have to snoop it on reads which might clash with uncommitted writes. And you have to consider any effects on interrupt latency.