6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Apr 18, 2024 4:23 am

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Wed Sep 03, 2014 6:53 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
As you can see with a search we've mentioned Acorn's Tube previously - it's a coprocessor interface protocol, intended for the BBC Micro's operating system's needs but readily translatable to other OSes. In Acorn's vision, the Tube connects a host system to a parasite system. The host system has all the I/O devices (and the human operating it) while the parasite has a fast CPU and lots of memory. Many different CPUs have successfully been hooked up - see http://en.wikipedia.org/wiki/BBC_Micro_expansion_unit and http://en.wikipedia.org/wiki/Tube_(BBC_Micro)

There are some resources at JGH's site:
http://mdfs.net/Apps/Emulators/Tube/
http://mdfs.net/Software/Tube

There's an open-source verilog model of the Tube interface chip at
https://sites.google.com/site/beeb816/file-cabinet
(several versions) and some detail at
https://sites.google.com/site/beeb816/h ... be-on-fpga
(richarde and myself are to blame for this project, which is not quite final, but help with testing would be appreciated!)

There's an in-progress investigation of the implementation of the original chip over at
http://stardot.org.uk/forums/viewtopic.php?t=8539
courtesy of Steve Furber who recently digitised the original check plot of the chip.

It's also possible to run the Tube protocol over other channels (the original chip being in limited supply) such as back-to-back parallel ports, serial lines, and 256 bytes of shared memory. (OK, probably the parallel port method doesn't implement the Tube API, but if that API can be carried over serial, it can surely be carried over parallel.)


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 02, 2017 6:23 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
This thread is ready for an update or two!

Our own efforts at an HDL model of the Tube chip have been superseded by others: this one for example:
https://github.com/hoglet67/CoPro6502/t ... r/src/Tube

Also notable, the PiTubeDirect project has demonstrated that a Tube chip or hardware remake of it are not even needed - with enough care and attention to timing the Raspberry Pi can emulate the Tube chip at the same time as emulating the second processor's CPU:
https://github.com/hoglet67/PiTubeDirect/wiki

Finally, it's worth linking to a definitive (though incomplete) reference from Acorn themselves, in the form of App Note 04:
http://www.astro.ljmu.ac.uk/~bbcdocs/li ... te-004.pdf
and also useful their Service Manual:
http://chrisacorns.computinghistory.org ... procSM.pdf

For reference, the Tube connector pinout is as shown below. It's just a subset of the 6502 bus, with an additional address decode The red stripe on the cable is pin 1 (IDE cables fit the bill, so long as they are fully populated) and pin 1 is nearest the edge of the Beeb:
Code:
TOP  Pin No BOTTOM
 0V   1  2  R/NW (read/not-write)
 0V   3  4  2MHzE
 0V   5  6  N1RQ (not-interrupt request) (from parasite to host, not used in practice)
 0V   7  8  NTUBE (decoded address of the Tube peripheral, normally FEE0 or )
 0V   9 10  NRST (not-reset)
 0V  11 12  D0
 0V  13 14  D1
 0V  15 16  D2
 0V  17 18  D3
 0V  19 20  D4
 0V  21 22  D5
 0V  23 24  D6
 0V  25 26  D7
 0V  27 28  A0
 0V  29 30  Al
+5V  31 32  A2
+5V  33 34  A3
+5V  35 36  A4
+5V  37 38  A5
+5V  39 40  A6


Edit: to clarify, Acorn's Tube protocol runs over a 40-way cable which contains a bidirectional 8 bit bus, but at a higher level, the Tube chip implements 8 unidirectional byte-wide channels, each with a short FIFO (different depth for different channels.)


Last edited by BigEd on Tue Jan 29, 2019 6:13 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 11:50 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
Just to cross-link to a related thread:

and another, over on StarDot:

Edit: just to add a note, that while the normal use of Acorn's Tube is to join a frontend processor which does all I/O and an application processor which does all computation, it is possible for the application to upload code to the frontend, extend the communications protocol, and to have some division of labour. One example is the Tube version of Elite, the 3-D space trading game, where the front end machine maintains the very graphical user interface in coordination with high-level updates from the application processor. Another example is the recent Tube version of Conway's Life, where the display update protocol is somewhat compressed for transit. Acorn's own x86 application processor also sends screen updates using a modified protocol. In the absence of an update to the frontend code, user input events would be sent to the application and screen updates would be limited to relatively low performance character, line, and point plotting.


Last edited by BigEd on Tue Jan 29, 2019 11:07 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 6:45 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
I tried reading the documentation of the tube from some Acorn docs, and it was pretty dense.

The PiTube direct project isn't much help to me, as I'm more curious about it going the other direction. PiTube is about interfacing synthetic CPUs to a BBC host. I'm more interested in interface a RPi to a 65xx host over a fast parallel interface.

It seems that the Tube is essentially a custom, mini FIFO. I looked at a small FIFO to be used as a liaison to a RPi, but I'm still not quite sure how it would work.

I just don't know how to divvy up the work to enable bi-directional, simultaneous, communication. Who interrupts who, does everyone just poll the FIFO, how to keep a large transfer from saturating the interface. Do I need to have priority packets. it's a combination of a hardware and software problem.

I guess I don't care. If I have a I/O processor sending over blocks from a disk read at the same time it's sending characters from a keyboard, it's more a matter of the nature of the I/O. The disk read is a query-response thing. The CPU asked for the disk block, and the I/O processor is responding. Whereas a keyboard event is out of band. Inevitably, the CPU will ask for it. But should it be handled out of band, that is do we simply buffer it on the I/O processor and respond when the CPU asks. Or do we tell the CPU that there's characters waiting.

Because part of this is I see all of the I/O as asynchronous. CPU makes a request and then, "later", a response come in. The program can poll the landing area where the data will show up, but it's not going to poll the I/O processor. The processor is a "slave" to the CPU "master", but since it's the source of all of the I/O, well, who's driving who. It's more the I/O processor simply has to let the CPU know it has packets waiting and the CPU drains them.

I guess as long as each request to the I/O processor has a destination, the interrupt routine can simply marshal the packets "transparently" as it sees them to where they belong, where the program then acts on their arrival and makes more requests. I guess that can work.

The night mare scenario is you have a disk being read, a disk being written, character I/O, and a network socket pushing in a request all at the same time, and marshaling all that properly without stalling the interface.

At some point a character request is going to have to wait behind a 256 byte disk buffer read.

Life in the fast lane.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 6:54 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
Not sure if it might be worth a new thread which can freely explore the design space of communication between a 6502 and something else. (Edit: see Parallel co-processor interfacing thread.)

But I'm not sure about your idea that PiTubeDirect is the wrong direction: the Beeb is a 6502 machine, and the Pi is a Pi. The difference, perhaps, between your idea and the PiTubeDirect idea, other than being interested in rolling your own protocol, is that you are perhaps thinking that the Pi is running Linux and is the machine closer to the user. The 6502 system is hanging off a parallel interface, perhaps with no other I/O mechanism. In which case yes, this is something like the opposite.

Acorn's original Tube chip is a collection of 8 FIFOs, with some slightly subtle flow control features, and which sits on two different microprocessor busses, both times as a multi-address peripheral. As you'll note from the head post, the Tube protocol has successfully been serialised, which means it can run over a single byte-wide channel. It should also be clear that the depth of the FIFOs has only a small effect on performance - a bigger FIFO allows for looser coupling and greater parallelism.


Last edited by BigEd on Tue Jan 29, 2019 11:09 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 7:19 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
So, how long does it take to transfer 256 bytes? How much latency is acceptable in the character interface? You may well find that those constraints are compatible.

The Tube is actually a set of *eight* FIFOs, some deeper than others, each with its own status flags and interrupt masks IIRC. I think each side is capable of interrupting the other, simply by putting a byte into a FIFO that has an interrupt unmasked at the other end. But mostly the parasite makes requests and, if necessary, polls for results.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 8:49 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
There's some detail on the behaviour of Acorn's Tube chip here:
http://www.cowsarenotpurple.co.uk/bbcco ... ml#tubeula

The various channels are not all the same: it's subtle!


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 10:11 pm 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 337
Chromatix wrote:
So, how long does it take to transfer 256 bytes? How much latency is acceptable in the character interface? You may well find that those constraints are compatible.

I assume you are asking about Acorn's Tube implementation.

The limiting factor is the 2MHz 6502 in the host (i.e. the Beeb).

The peak transfer rate is specified at 10us per byte (i.e. 100KB/s), so 256 bytes would be 2560us. (transfer types 6 and 7):
http://mdfs.net/Info/Comp/Acorn/AppNotes/004.pdf

Dave


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 10:21 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
BigEd wrote:
But I'm not sure about your idea that PiTubeDirect is the wrong direction: the Beeb is a 6502 machine, and the Pi is a Pi. The difference, perhaps, between your idea and the PiTubeDirect idea, other than being interested in rolling your own protocol, is that you are perhaps thinking that the Pi is running Linux and is the machine closer to the user. The 6502 system is hanging off a parallel interface, perhaps with no other I/O mechanism. In which case yes, this is something like the opposite.

I'm honestly not interested in rolling my own, I'd much rather stand on the shoulders of giants than make something myself.

It's different because the way I understand it is that in the standard case, the Beeb is used for its I/O capabilities, while the Tube Co-Processor is used for it's processing capability. Thus the Z-80 running CP/M thing.

My case is the opposite. In my scenario, the 65xx is the "co-processor" and the Pi is the Beeb -- the I/O processor. The roles are swapped.

In theory, it "doesn't matter". (At least I think it doesn't matter.) Being as the interface is a) asynchronous and b) bi-directional, neither is the master. Rather they just have distinct roles, and the Beeb is "nominally" the "slave" to the master co-processor, when in fact it's just two machines doing what they do. They're fiercely independent and simply cooperating. (perhaps the distinction is subtle).

But for my Pooh brain, I'd like to see it approached with the Pi being the I/O processor to the Processing side of the 65xx.

The PiTube is also not quite interesting as it's a purely software solution. There's no example (that I know of) of someone interfacing an actual hardware chip to the interface. Rather, it's the Pi emulating the interface, and then driving internal, virtual processors.

So, we still have the Magic Tube Chip on one hand, and no real hardware on the other.

My original plan, mentioned elsewhere, was perhaps trying to leverage the GPIB interface to do this, as it seemed to be a multi-master solution (with a dominant controller, but...). But It was difficult for me to see how to orchestrate the bi-directional nature that I'm looking for. The introduction of FIFOs (in whatever form) sort of simplifies that for me. But it's a bit of (to me) complicated hardware, and it's not clear to me how those would work.

Quote:
Acorn's original Tube chip is a collection of 8 FIFOs, with some slightly subtle flow control features, and which sits on two different microprocessor busses, both times as a multi-address peripheral. As you'll note from the head post, the Tube protocol has successfully been serialised, which means it can run over a single byte-wide channel. It should also be clear that the depth of the FIFOs has only a small effect on performance - a bigger FIFO allows for looser coupling and greater parallelism.


The greater parallelism is nice in that it lets both sides perform more at their native speeds, even though in the end, the data still goes only as fast as the slowest member. But with greater parallelism, the fast side can be finished long before the other side has even started, even though it may well be waiting for control signals to actually continue.

The idea of a 512 byte FIFO (I think I saw a chip like that), with the I/O processor able to burst over a 256 byte disk block along with control information at "full speed", that's appealing. But, in the end, for the use case of the RPi, I really don't care if it's idle and waiting on the 65xx, that's it's job -- it's not like its doing weather modeling or bitcoin mining in the background. It should be able to do all of it I/O duties far faster than the 65xx can ask for the data (within reason, of course).

But, having it be able to squirt over an entire block -- sounds nice on paper.

I don't visualize needing more than one bi-directional FIFO (so is that 2 FIFOs?).


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 10:26 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
Sounds like this needs a new thread. (Edit: see Parallel co-processor interfacing thread.)


Last edited by BigEd on Tue Jan 29, 2019 11:11 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 10:34 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
hoglet wrote:
Chromatix wrote:
So, how long does it take to transfer 256 bytes? How much latency is acceptable in the character interface? You may well find that those constraints are compatible.

I assume you are asking about Acorn's Tube implementation.

The limiting factor is the 2MHz 6502 in the host (i.e. the Beeb).

The peak transfer rate is specified at 10us per byte (i.e. 100KB/s), so 256 bytes would be 2560us. (transfer types 6 and 7):
http://mdfs.net/Info/Comp/Acorn/AppNotes/004.pdf

Seems like there's quite enough bandwidth to handle a high speed terminal session while interlaced with other I/O is going on.

Pretty sure I won't notice 2ms while holding down an auto-repeat space bar, or scrolling through a program listing.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 28, 2019 10:45 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1395
Location: Scotland
whartung wrote:
My case is the opposite. In my scenario, the 65xx is the "co-processor" and the Pi is the Beeb -- the I/O processor. The roles are swapped.


This is how my Ruby SBC works - although in the Mk 1, the IO processor is an ATmega. My own "roadmap" is that a Pi will be the IO processor in the Mk 2 which will have a 65C816 as the user processor, so the Pi being the "BBC Micro IO processor black box thing".

The ATmega interface has a block of shared memory - 256 bytes which appears at $FF00 in the 6502, so the ATmega can write a small bootloader and the reset vector, then let the 6502 run. After that it's really a slave to the 6502 - the 6502 writes a command in the shared memory area, signals that there is something to do to the IO processor which then does it (e.g. print a character down the serial line, or load a block out of a ROM file) the IO signals back to the 6502 "done" and it carrys on.

For the Mk 2 - I felt that having a bigger cpu (ie. 816) needed, or could warrant a bigger IO processor, so a Pi, and possibly a larger shared RAM space (like 512 bytes!) although right now the working window for file transfers out of the "ROM" (ie. ATmega flash) is 128 bytes and it's fast enough.

The down-side of this approach is that the 6502/816 is stalled while the IO processor is doing its thing - it was easy and negates any sort of cleverness with the shared memory thing. Hindsight would have me using dual-port SRAM, but ...

Pi issues are mosrly round the 3.3v/5v conversions but that's solved with level converters. I would be running Linux on the Pi but that's more for convenience than anything else.

A plan B - if I were using a ROM on the 6502/816 side would be to use a 6522 - and implement an 8-bit strobe + ack bi-directional bus between the 6502 and ATmega, or a 16-bit bus from an 816 to a Pi (ie. both ports on a 6522) Pi can pickup the 16-bits and process it more than fast enough - the limitation on speed is the 65xx side of things.

I'm actually at the point of wishing I'd gone down the Pi route for the 6502 right now - as to get software into it, I'm reprogramming the ATmega 1284 every time although I have that down to about 15 seconds now, but it might have been nice to run the assembler on the Pi, then just squirt it into the 6502 directly.

Ironically, a Pi is probably about the same cost as the Atmega 1284P, can osc. plus the sd card level shifter I'm using.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Tue Jan 29, 2019 9:58 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
Thanks, whartung, for taking your development/exploration thread over here for further discussion:

Edit: just added a note to the third post in the thread.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: