6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 4:30 pm

All times are UTC




Post new topic Reply to topic  [ 58 posts ]  Go to page Previous  1, 2, 3, 4
Author Message
PostPosted: Mon Jan 04, 2021 11:16 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
railsrust wrote:
I'm thinking of using a W65C134s to control some secondary chips so as to offload the main cpu to give all of its grunt to the video chip.

Would it be possible to control the 8 bit MCU with the 16 bit one using one Parallel Interface Bus on each chip?
Sorry but I'm having trouble making sense of this. The W65C134s is not a 16 bit chip; nor is the 6502 you mention later -- and neither of them features a Parallel Interface Bus. Instead of '134, do you mean the '265? (Or will there be THREE processors??)

Quote:
Would there be a particular set of pins I would need to connect on each, or would it matter?
Yes it would matter. You'd wanna attach the Parallel Interface Bus of one processor to the Parallel Interface Bus of the other. (But the '134 has only a serial bus.)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 05, 2021 12:18 am 
Offline
User avatar

Joined: Sun Mar 03, 2019 5:09 am
Posts: 22
Location: Texas
Yea, sorry. I was needing a nap when I posted earlier.

Yeah, I meant that the W65C265S as the host processor.

I assumed too much about the '134S. I thought the SIB was a little odd. I just assumed the I/O port was the same thing.

Welp, live and learn. Oh well, I can just swap it out for a second '265S.

I was hoping for that third generic chip select on the '134 so I could use it for a 65C22. Oh well. I'll just dump an I/O port through SPI.

The reason for a whole other processor is largely a lack of chip selects on the main MCU. That and so I can use the extra processing cycles of the second one to play sample through the YM3438's DAC enable feature on channel 6.

If nothing else, it just means I won't have worry about sending controller input information back to the work processor.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 09, 2021 2:46 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
railsrust wrote:
I'm thinking of using a W65C134s to control some secondary chips so as to offload the main cpu to give all of its grunt to the video chip.

Would it be possible to control the 8 bit MCU with the 16 bit one using one Parallel Interface Bus on each chip?

As you've pointed out, using a second processor to deal with specific peripheral processing that can be done more or less independently after receiving commands from the main processor is a fairly traditional technique. But it seems to me that using a parallel interface bus between the two processors is likely to produce a fair amount of stalling on both sides, since the two processors both need to co-ordinate in order to talk to each other at the same time when they're communicating.

The Fujitsu FM-7 (using a 6809 processor—but the buses work basically the same as the 6502) offloaded a significant amount of display- and graphics-related processing to a second 6809 processor using a 128-byte shared memory for communication. You might find the details interesting; you can find the schematics and a bit of reverse-engineering information in this repo.

You're thinking of keeping graphics on the main board, of course, but the general idea is the same. (And where you put the graphics is really system- and application-dependent, so don't take what Fujitsu did to mean that you're not making the right decision for your system.)

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 09, 2021 6:17 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
cjs wrote:
...it seems to me that using a parallel interface bus between the two processors is likely to produce a fair amount of stalling on both sides, since the two processors both need to co-ordinate in order to talk to each other at the same time when they're communicating.

Hmm, while I do see that a small shared memory, or even a FIFO, could aid in helping efficient communication, in making it easier, I would think that a simple bytewide channel could be fine, with the right approach. We know that Acorn's Tube protocol, or something very like it, can be sent over a parallel link (Torch's 6809) or over a serial link (JGH's serial Tube.) That's a multi-channel protocol, logically, with some chunky transfer types, but can evidently be marshalled and transferred usefully quickly.

I would think the key is for the sender always to be ready with the whole block to send at the time of starting a transfer, and the receiver always being ready to read the whole block into a buffer. So transfer of a packet proceeds at line rate, while gaps between packets are not a problem. If the packet size is limited, then there's a limit to the delay imposed on a high-priority packet if it has to wait for a transfer in progress to finish.

Suitable use of handshake signals and interrupts should help here. Data transfers can be handled in interrupt context, with data read from or written to buffers, which application context deals with.

Does that sound workable?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 10, 2021 3:31 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
BigEd wrote:
Hmm, while I do see that a small shared memory, or even a FIFO, could aid in helping efficient communication...

To be clear, it doesn't make the communication itself, when it is occurring, any more efficient; that happens at about the same speed regardless of whether you're writing to shared memory or doing PIO.

The point is that, if reasonably well designed, most of the time it prevents either processor from having to block when trying to communicate with the other. The blocking isn't a big deal if your purpose in having two processors is simply access to something by one processor that only the other processor has (e.g., the ability to run code in a ROM), but if your purpose is actually to increase throughput (e.g., to have one processor rendering graphics while the other does something else at the same time), having to block and do nothing while waiting for the other processor to complete an operation is obviously going to interferere with this.

Quote:
We know that Acorn's Tube protocol, or something very like it, can be sent over a parallel link (Torch's 6809) or over a serial link (JGH's serial Tube.) That's a multi-channel protocol, logically, with some chunky transfer types, but can evidently be marshalled and transferred usefully quickly.

Sure. But if I understand it right, that uses FIFOs, which are a form of shared memory. And hardware FIFOs are rather more complex to implement than basic shared memory, as far as I know. (I'd certainly be interested to see any implementations that people think are relatively simple! Or even not so simple. Acorn's is hidden in a ULA.)

Quote:
I would think the key is for the sender always to be ready with the whole block to send at the time of starting a transfer, and the receiver always being ready to read the whole block into a buffer.

This is effectly what shared memory does: since the receiver need take no action to "read" the mesage into the buffer (as opposed to processing it), the sender is never delayed (so long as there is space for another unprocessed message). Further, the receiver is free to do other useful work while messages are being sent.

As you suggest, you could probably work out something with a PIO system where one process could interrupt the other to tell it to start reading a message that the one wants to send, but you still have interrupt latency there, pauses in sending while the receiver responds to any other interrupts it receives, and significant additional handshaking complexity. (Don't take this to mean that there's no handshaking complexity at all in the shared memory system, though! You still need to ensure that partial messages aren't read and misinterpreted.)

When PIO is slow, the obvious solution is DMA. Which brings us right back to one processor being able effectively to write directly to memory that the other processor can read.

By the way, I should mention that there are probably simpler ways of doing shared memory than the way Fujitsu did it. (The FM-7 is not exactly a parsimonious, or even economical, design.) One obvious one would be Woz-style shared memory, as implemented in the Apple II, where the CPU and another system access memory on opposite phases of Φ2.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 10, 2021 11:22 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
cjs wrote:
BigEd wrote:
We know that Acorn's Tube protocol, or something very like it, can be sent over a parallel link (Torch's 6809) or over a serial link (JGH's serial Tube.) That's a multi-channel protocol, logically, with some chunky transfer types, but can evidently be marshalled and transferred usefully quickly.

Sure. But if I understand it right, that uses FIFOs, which are a form of shared memory. And hardware FIFOs are rather more complex to implement than basic shared memory, as far as I know. (I'd certainly be interested to see any implementations that people think are relatively simple! Or even not so simple. Acorn's is hidden in a ULA.)

In fact no, both those implementations have no FIFO. Acorn's own implementations do indeed use a ULA which does include several small FIFOs, but these don't. The original circuit for the ULA is now in circulation so we can even see how it's done, but as it's all asynchronous logic the modern reimplementations in Verilog are more conventional. The term 'Tube' unfortunately has about four different meanings: it's the facility to have a second processor, it's an API, it's a particular connector, and it's a ULA chip. In this case, I'm noting that the API has had at least three interestingly different implementations underneath it, and only one of those three uses FIFOs. It turns out that the FIFOs are not crucial - which was the point I wanted to make.

Of course, I'm happy see people build systems with FIFOs, with shared memory, or without, depending on their inclinations and constraints.

Hope this helps!


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 10, 2021 11:36 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
BigEd wrote:
In fact no, both those implementations have no FIFO.

So basically your Tube processor writes a byte to the other side and then stops dead until the other side reads that byte? That sounds particularly bad when you're using a Tube processor much faster than the onboard 6502 with the intent of speeding up your processing. But of course it depends on how much communication you actually need to do.

Quote:
It turns out that the FIFOs are not crucial - which was the point I wanted to make.

Well, the original intent, as railsrust wrote, was "...using a W65C134s to control some secondary chips so as to offload the main cpu to give all of its grunt to the video chip." So in that case some form of asynchronous communication seems to be fairly important in order to keep the main processor from stalling during this communication, rather than doing its graphics work.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 10, 2021 12:19 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Hmm, I don't quite understand. With interrupt driven transfers, and a suitable protocol, there's no need for anything to stop dead. The sender should never be set to send more than the receiver is prepared to receive, and the protocol can arrange for that. There may be a circular buffer which hands off between interrupt context and application context, but that's just a matter of programming - it's not extra hardware.

(BTW, I'm happy to delete this reply and replace it with a pointer to a new thread, if we need further discussion - just drop me a PM)


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 10, 2021 3:10 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 727
Location: Tokyo, Japan
BigEd wrote:
Hmm, I don't quite understand. With interrupt driven transfers, and a suitable protocol, there's no need for anything to stop dead.

Oh, I think I see: you're proposing that both ends use per-byte interrupt-driven transfers, right? That is, the sender sets up a byte on the transfer port, triggers an interrupt on the receiver, and then goes off to do something else, with an interrupt from the receiver acknowledging receipt of the byte that would then trigger sending of the next byte?

Sure, nothing stops dead there, but now you're adding a couple of score cycles of overhead to each byte transfer, plus more overhead to maintain your own copies of buffers that you could just maintain directly in shared memory. I guess if you're transferring data at a very low rate, this might not hurt too much, but it does seem to involve writing significantly more software.

There's a reason that DMA peripherals are generally a lot faster and lower-overhead than PIO peripherals.

Quote:
...it's not extra hardware.

I am not clear here if you're thinking that shared memory requires extra hardware, but if we're talking just two Motorola-bus-style processors (such as an '816 and a 6502), I think it requires almost nothing if your shared memory is fast enough and you do it the way the Apple II or C64 did it, having the second device run on the "opposite" side of Φ2. Even if it requires an extra package or two, it still frees up a pair of parallel ports.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Sun Jan 10, 2021 4:28 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 702
Location: North Tejas
The best solution may be to implement two message queues in shared memory, one in each direction, protected by semaphores.

The code in each processor might do this

Code:
loop:
    if message available:
        process message

    do stuff


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 22, 2021 7:12 pm 
Offline
User avatar

Joined: Sun Mar 03, 2019 5:09 am
Posts: 22
Location: Texas
I'm curious if a dual ported ram such as this would suffice.

https://www.mouser.com/ProductDetail/Renesas-IDT/7132LA55PDGI/?qs=SmUuHNCnblpAQG1pdYFA4Q%3D%3D


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 22, 2021 8:50 pm 
Offline

Joined: Sat Jan 02, 2016 10:22 am
Posts: 197
Dual port ram works quite well between a 4mhz Z80 and 14mhz 65C02. there's a photo of the system I built somewhere in the introduce yourself thread.

The 1k IDT7130 might be a better choice though, as it has an interrupt output triggered by a write to a high address. Doing without that on my board meant sorting our a protocol where the "slave" Z80 doing the I/O spent most of it's time poling for instructions from the 65C02. Not great if you want to share the workload more evenly.

You can also use pairs of 74HC(T)40105's to form an 16 byte queue in each direction.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 22, 2021 5:58 am 
Offline
User avatar

Joined: Sun Mar 03, 2019 5:09 am
Posts: 22
Location: Texas
I'm on summer break from college classes, so I have time to focus on this more now.

Since then I decided that I should skip the second processor and just focus on one cpu.

Thinking about it, this thing is going to be running somewhere between 7-8Mhz, and clock for clock it's much faster than the Sega Genesis which was the original application for the Yamaha FM chip I'm going to be using, even with the Z80 taking up some of the load from the Motorola 68K. It was still pumping out sound samples fairly well all things considered.

That said, I remember earlier in the thread about being able to get more chip selects out of the MCU.

What would be involved in doing that? Would it be possible to add more and still run the monitor rom?

The V9958 probably isn't going to like the cpu running so fast, so I intend to use the internal functions that can allow you to program different chip selects to slow the processor down when they're active.

Would this be effected by adding more chip selects, or could it be worked around?

I suppose I could just work with the RDY pin if I need to, but I'd prefer to not have to.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 58 posts ]  Go to page Previous  1, 2, 3, 4

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: