6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 2:15 pm

All times are UTC




Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sun Dec 13, 2020 4:07 pm 
Offline

Joined: Sat Apr 11, 2020 7:28 pm
Posts: 344
From Kiwi - a 68k Homebrew computer

Quote:
8 to 16 bit buffer interface

While the CPU, memory and most of the subsystems are connected with 8 bit width, there are two subsystems which use a 16 bit bus. These are the ATA/IDE interface and the Ethernet interface (CS8900a). To access 16 bit with an 8 bit CPU, one has to break accesses. One 16 bit access works out as two consecutive 8 bit accesses. This works as follows.

Image

To read an 16 bit word, the device is accessed with the first access. While it provides its data, the lower half (D0-D7) is read by the CPU while the upper half (D8-D15) is latched (U44). The second access just enables the output buffer of this latch.
To write an 16 bit word, the device is accessed with the second access. The data provided in the first access is latched (U45). With the next access, the latched data is provided as low byte (D0-D7), while the CPU data is provided as high byte (D8-D15).

As the MC68008 always accesses low byte first, word addressing modes with 16 bit are possible. In this way, the address line A0 can be used to determine the current state of an access. A0 is low for the first and high for the second access.


Can this be applied to the 6502/816, software and hardware wise?


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 13, 2020 4:22 pm 
Offline

Joined: Sat Apr 11, 2020 7:28 pm
Posts: 344
The S1D13513 Epson graphics chip (product brief) is almost the perfect companion chip for a computer based in 65xx in 2020. And it's almost because it uses a 16 bit data bus. That's why I ask for this info.


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 13, 2020 5:19 pm 
Offline

Joined: Mon Sep 14, 2015 8:50 pm
Posts: 112
Location: Virginia USA
Something similar is described here:

http://s.guillard.free.fr/Apple2IDE/Apple2IDE.htm


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 13, 2020 9:07 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
tokafondo wrote:
Quote:
8 to 16 bit buffer interface
[ etc. ]
This looks entirely workable to me. In fact, it's the same approach as what I posted in your other thread back in May! :P Perhaps you overlooked that post. In any case, feel free to mention any outstanding questions you may have about how the scheme works.

The key point is this: a single instruction causes the CPU to generate two consecutive 8 bit accesses (either 2 consecutive reads or 2 consecutive writes). These are seen by the I/O device as one 16 bit read or write.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 14, 2020 12:52 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
Dr Jefyll wrote:
The key point is this: a single instruction causes the CPU to generate two consecutive 8 bit accesses (either 2 consecutive reads or 2 consecutive writes). These are seen by the I/O device as one 16 bit read or write.

Sounds like a job tailor-made for the 65C816. 8)

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 14, 2020 3:26 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigDumbDinosaur wrote:
Sounds like a job tailor-made for the 65C816. 8)
Yup, on an '816 you only need a single instruction to generate that 16-bit access. If you load or store a register that's configured to be 16-bit then the CPU will automatically break that into 2 consecutive 8-bit reads or 2 consecutive 8-bit writes. (And the hardware in the diagram translates those back to/from a single, 16-bit access.)

The lead post also mentions 6502, which is acceptable, too, but not as efficient. It will need a separate instruction for each of the two 8-bit chunks. For comparison, here are two versions of a simple example. (In both versions the I/O device sees only a single, 16-bit access. That's what the extra hardware is for.)
Code:
; '816 version. Assumes A is set as 16-bit.

LDA  FancyDevice    ;load A reg with 8 bits from FancyDevice and 8 bits from FancyDevice+1

Code:
; 6502 version

LDA  FancyDevice    ;load A reg with 8 bits from FancyDevice
LDY  FancyDevice+1  ;load Y reg with 8 bits from FancyDevice+1

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 14, 2020 5:44 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
Dr Jefyll wrote:
Yup, on an '816 you only need a single instruction to generate that 16-bit access. If you load or store a register that's configured to be 16-bit then the CPU will automatically break that into 2 consecutive 8-bit reads or 2 consecutive 8-bit writes.

Your post reminds me of a yet-to-be-resolved hardware issue with POC V1.1, which was the version in which I added SCSI. The 53CF94 SCSI controller's DMA functions require that a 16-bit counter be loaded with the total number of bytes to be transferred by the next transaction. The count is set in two contiguous registers that are in big-endian order.

My plan was to load the (16 bit) accumulator with the 16-bit byte count, do an XBA to reverse endianess and then write the result to the CF94. The MSB would be written to the MSB register and on the next Ø2 cycle, the LSB would be written to the LSB register. Seems foolproof, eh?

Try as I might, I could not get it to work. Setting the count by writing the MSB to the MSB register as one instruction, followed by writing the LSB to the LSB register as a second instruction would work. Changing that working code to write the count as a 16-bit value consistently failed. At the time, I thought there had to be a timing bug involved, but never investigated it. It was around the time that my old Beckman scope went belly-up, so my ability to diagnose the problem was hobbled. Since SCSI was otherwise working fine I back-burnered the problem.

Having to split the write has no real performance implications, as it is only done once per SCSI bus transaction, no matter how much data is moved. So it has remained a low-priority matter. One of these days I'll revisit it...

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 14, 2020 10:03 pm 
Offline

Joined: Sat Apr 11, 2020 7:28 pm
Posts: 344
Dr Jefyll wrote:
This looks entirely workable to me. In fact, it's the same approach as what I posted in your other thread back in May! :P Perhaps you overlooked that post. In any case, feel free to mention any outstanding questions you may have about how the scheme works.

The key point is this: a single instruction causes the CPU to generate two consecutive 8 bit accesses (either 2 consecutive reads or 2 consecutive writes). These are seen by the I/O device as one 16 bit read or write.

-- Jeff



Thanks, you are right!! I'm asking again the same questions, more than half a year later.

Dr Jefyll wrote:
In your code you'll need to stick to simple loads and stores such as LDA BIT STA STY etc. Instructions like that always deal with the lowbyte first. But R-M-W instructions (INC DEC etc) are a no-no -- a minor limitation to keep in mind. The write portion of a R-M-W instruction will deal with the high byte first (according to WDC's '816 datasheet) and thus won't work as expected with this circuit.


How difficult would be to code things like games and demos following those rules? Because having the circuits and the CPU managing the 16 bits with some instructions doesn't mean that there is an use for it because of that coding limitations...


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 14, 2020 11:17 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
It should be possible to rig up something to accept writes in either byte order, using just a few more gates. In this case the write "buffers" will both need to be transparent octal latches, with their load-enables driven directly by the CPU bus decoding, and their output-enables driven only for the second write of the pair. One bit of storage is needed to indicate whether the first write has occurred.

A similar arrangement could also be used to permit reads in either byte order, but this might not be necessary.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 15, 2020 1:21 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigDumbDinosaur wrote:
[...] The count is set in two contiguous registers that are in big-endian order.

My plan was to load the (16 bit) accumulator with the 16-bit byte count, do an XBA to reverse endianess and then write the result to the CF94. The MSB would be written to the MSB register and on the next Ø2 cycle, the LSB would be written to the LSB register. Seems foolproof, eh?

BDD, I understand this is a low-priority problem, one which you rightly back-burnered in favor of more important issues. But are those "two contiguous registers" indeed in Big-Endian order?

I gather the Count register is actually 24 bit, with the 3 bytes appearing at (in order of increasing significance) Addresses $00, $01 and $0E. Would I be correct in concluding it's $00 and $01 to which you're writing with a single instruction? I'd say the relation between those two is Little-Endian, given that the less-significant byte is stored at the lower address.

Am I missing something? I tend to suspect your XBA is what created the problem. (I know, I know... premature optimization is tough to resist! :oops: )

-- Jeff


Attachments:
53CF94-53CF96 register map.png
53CF94-53CF96 register map.png [ 81.33 KiB | Viewed 1072 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 15, 2020 1:23 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Chromatix wrote:
It should be possible to rig up something to accept writes in either byte order, using just a few more gates.
Yes -- thanks, Chromatix.

tokafondo, as Chromatix says, the coding limitation can be eliminated with only a small increase in circuit complexity. On the other hand, the difficulties resulting from the coding limitation are also small. That's because...
    (a): in regard to an I/O device, it's pretty unlikely you'll want to use an INC DEC ROL ROR ASL or LSR (these are the Read-Modify-Write instructions) in the first place. (Edit: TSB and TRB might be somewhat useful, though.) And,

    (b): it's easy to simulate the effect of these instructions anyway. For example, ...
    Code:
    INC DeviceRegister ; a Read-Modify-Write instruction
    can be replaced by
    Code:
    LDA DeviceRegister
    INC A
    STA DeviceRegister

I understand your goal is to build a system around the S1D13513 graphics chip, which is a very fancy device indeed! I'm sure it'll present you with some challenges :shock: but this Read-Modify-Write limitation is a minor matter.

To implement the 16-bit bus interface, do you plan to use discrete logic (eg, 74_574 etc) or will you use some sort of PLD? In the latter case a small increase in complexity (to beat the small limitation) may come for free.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 15, 2020 2:06 am 
Offline

Joined: Sat Apr 11, 2020 7:28 pm
Posts: 344
Dr Jefyll wrote:
tokafondo, as Chromatix says, the coding limitation can be eliminated with only a small increase in circuit complexity. On the other hand, the difficulties resulting from the coding limitation are also small. That's because...
    (a): in regard to an I/O device, it's pretty unlikely you'll want to use an INC DEC ROL ROR ASL or LSR (these are the Read-Modify-Write instructions) in the first place. And,

    (b): it's easy to simulate the effect of these instructions anyway. For example, ...
    Code:
    INC DeviceRegister ; a Read-Modify-Write instruction
    can be replaced by
    Code:
    LDA DeviceRegister
    INC A
    STA DeviceRegister


Curious, wikipedia says, citing the Including the 6502, 65C02, and 65802 book, that
Wikipedia wrote:
When register sizes are set to 16 bits, memory access will access two contiguous bytes of memory, at the cost of one extra clock cycle. Furthermore, a read-modify-write instruction, such as ROR <addr>, when used while the accumulator is set to 16 bits, will affect two contiguous bytes of memory, not one. Similarly, all arithmetic and logical operations will be 16-bit operations.

I can't confirm or deny this as I'm no coder at all, but I just wanted to put it here for you to see.


Dr Jefyll wrote:
I understand your goal is to build a system around the S1D13513 graphics chip, which is a very fancy device indeed! I'm sure it'll present you with some challenges :shock: but this Read-Modify-Write limitation is a minor matter.

To implement the 16-bit bus interface, do you plan to use discrete logic (eg, 74_574 etc) or will you use some sort of PLD? In the latter case a small increase in complexity (to beat the small limitation) may come for free.

-- Jeff


Well, as I have zero experience programming a PLD, it would be easier for me to throw in a bunch of 74 chips in the board at the cost of space... I could try to manage the S1D13781 chip that I currently have this way too, but I already have it working at 8 bits so this 16 bit thing will have to wait until the next project.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 15, 2020 2:42 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Yes, that is true, RMW instructions operating solely on memory are affected by the M flag, just as ALU instructions involving the Accumulator are. The wrinkle is that read and write operations must each take 2 cycles to transfer 16 bits over an 8-bit bus, and for the interface circuit first proposed, the order of those half-transfers is important. Reads are always low address first, but writes are sometimes low address first (eg. STA) and sometimes high address first (eg. TSB, TRB).
Attachment:
Screen Shot 2020-12-15 at 4.41.06 am.png
Screen Shot 2020-12-15 at 4.41.06 am.png [ 43.06 KiB | Viewed 1068 times ]
Attachment:
Screen Shot 2020-12-15 at 4.33.21 am.png
Screen Shot 2020-12-15 at 4.33.21 am.png [ 56.78 KiB | Viewed 1068 times ]
A circuit which assumes low address first (as above) and thus latches reads on the low address and commits writes on the high address, can still be used with the 8-bit versions of the R-M-W instructions, provided they are always used on both bytes in the correct sequence. However, that won't always give the correct results without some finagling. A circuit which commits writes on the low address would work for 16-bit RMW instructions but fail on plain 16-bit stores.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 15, 2020 2:46 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
Dr Jefyll wrote:
BDD, I understand this is a low-priority problem, one which you rightly back-burnered in favor of more important issues. But are those "two contiguous registers" indeed in Big-Endian order?...

Reading your reply had me go digging back through the code I was testing way back when (in 2012, to be more precise). My recollection was imperfect, but then that is why you put comments in your source files. :P

My code was treating those registers as being in little-endian order and my mention of the XBA instruction was erroneous. So the 16-bit write to registers $00 and $01 should have worked. I left a terse comment in the SCSI driver source code about the problem to remind me why the count setup was being done a byte at a time.

Quote:
I gather the Count register is actually 24 bit, with the 3 bytes appearing at (in order of increasing significance) Addresses $00, $01 and $0E. Would I be correct in concluding it's $00 and $01 to which you're writing with a single instruction? I'd say the relation between those two is Little-Endian, given that the less-significant byte is stored at the lower address.

Correct on all counts (sorry—couldn't resist that). Register $0E was reserved in the older 53C90 and 53C94 controllers, which could only do 64KB transfers. That register defaults to $00 in the 53CF94 to maintain compatibility if not touched. I don't use it because I can never transfer more than 64KB with the present POC hardware. When I finally build a unit with extended RAM...who knows?

Interesting bit of history about the 53xx9x series of SCSI controllers: they were originally designed for use in the Motorola 68K-powered minicomputers built by NCR in the 1980s and early 1990s. The 68K is little-endian, of course, so the word-size registers in the 53CF94 are little-endian as well. I'm going to revisit this to see if it works correctly in POC V1.2. Older versions of POC were a little sloppier with timing and didn't have wait-stating.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 15, 2020 2:59 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Actually the 68K is very much big-endian.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: