Commodore Disk Drive Acceleration Software
Commodore Disk Drive Acceleration Software
I'm wondering how fast-loaders/-savers for the Commodore 1541 disk drive actually work. For example, GEOS gets 8x base serial speeds over the IEC bus when talking to the 1541, making the system actually usable.
Does anyone know of a commented source listing for such software? I've seen listings for fast loaders in the past, but they've always been this opaque blob of machine language with no discernable entry points, and no algorithm description.
Thanks.
Does anyone know of a commented source listing for such software? I've seen listings for fast loaders in the past, but they've always been this opaque blob of machine language with no discernable entry points, and no algorithm description.
Thanks.
I don't know about commented source, but for a technical description see The Ultimate Commodore 64 Talk @25C3 on pagetable.com - there's also a pdf and a video of that talk.
Here are some snippets, courtesy of google:
"The real-time-clock has a resolution of 1/10 of seconds and supports generating ..... The 1541 has two VIA I/O controllers at $1800 (for the IEC bus) and at ... Such a new protocol would for example not do a handshake on every bit using the clock line, but shift a complete byte through in 4 steps, two bits at a time ..."
edit: tidy up quotes and links
Here are some snippets, courtesy of google:
"The real-time-clock has a resolution of 1/10 of seconds and supports generating ..... The 1541 has two VIA I/O controllers at $1800 (for the IEC bus) and at ... Such a new protocol would for example not do a handshake on every bit using the clock line, but shift a complete byte through in 4 steps, two bits at a time ..."
edit: tidy up quotes and links
Last edited by BigEd on Fri Aug 14, 2009 3:07 pm, edited 2 times in total.
A fast search on "1541 fastload source" came up with this:
http://www.ffd2.com/fridge/devpack/bin/
Its a directory list. See the "instructions" file. The fload1541.src appears to have the 1541 side of the code, and the fload.c64.src has the c64 side.
hope that helps
Daryl
http://www.ffd2.com/fridge/devpack/bin/
Its a directory list. See the "instructions" file. The fload1541.src appears to have the 1541 side of the code, and the fload.c64.src has the c64 side.
hope that helps
Daryl
Hi,
I have read a magazine article about Pink Floyd drive turbo (in Finnish, sorry) a long time ago, and there were two main points considered in that article. First, C64 and drive cpus are running at approximately same frequency, so DATA and CLK lines can be used as 2-bit parallel port, sync can be done by writing a routine that takes the same amount of cpu cycles both on the drive and C64 side. IIRC normally CLK is used to sync each and every bit. Second, although there is normally a 10 sector interleave between two successive sectors of the same file (this is how data is saved by normal save routines), normal drive routines let the disk spin once after reading one sector. Drive turbos read several sectors during one spin of the disk, speeding up the process. Pink Floyd is not the fastest turbo there is, so some of them might use some additional tricks not featured in the article.
I have read a magazine article about Pink Floyd drive turbo (in Finnish, sorry) a long time ago, and there were two main points considered in that article. First, C64 and drive cpus are running at approximately same frequency, so DATA and CLK lines can be used as 2-bit parallel port, sync can be done by writing a routine that takes the same amount of cpu cycles both on the drive and C64 side. IIRC normally CLK is used to sync each and every bit. Second, although there is normally a 10 sector interleave between two successive sectors of the same file (this is how data is saved by normal save routines), normal drive routines let the disk spin once after reading one sector. Drive turbos read several sectors during one spin of the disk, speeding up the process. Pink Floyd is not the fastest turbo there is, so some of them might use some additional tricks not featured in the article.
8BIT wrote:
A fast search on "1541 fastload source" came up with this:
That being said, I'm surprised to find that he's still handshaking on individual bits (in fact, if you look at the 1541 code, you'll find that the C64 drives the serial shifting at all times -- in effect, the C64 is driving the 1541 like a single-ended SPI device!). I can think of two ways of speeding up Finkel's code further:
1) Respond to edge-triggering. Transfer two bits per clock cycle, one on the rising edge, and one on the falling edge.
2) Use asynchronous transmission, and drive both the clock and data lines in parallel, as moonshine describes.
I'll have to play with this to see the effects. As time permits, of course. This is just idle curiosity at the moment, but I can see myself making use of this knowledge in the future.
There are several turbos in http://sta.c64.org/c64utils.html . I disassembled, mostly for curiosity, Star Turbo by Joe Forster/STA (I presume as there is no author mentioned) and found out it uses 2-bit transfer like I described. There is no source code but the code is mostly not too difficult to follow (it looks like it has been written using a monitor).
There is a relocator included with Star Turbo and it does not blank screen while loading, not to mention it also has turbo save, so I guess you can use it pretty much anywhere. For the standard version, entry point that installs it is $CF00 but the code starts at $CA00, where drive routines start. At $CDB0 there is a raster compare routine to sort out bad lines (so that you don't have to blank the screen) and the half-nybble transfer routine for transferring a byte from C64 to 1541. In $CEC0 there seems to be a similar routine for opposite direction. Similar routine for the drive seems to start at $CC00, before that there is something related to drive head movement.. This is just a quick glance at the disassembly with some omissions and probably some errors as well, but I hope this helps.
There is a relocator included with Star Turbo and it does not blank screen while loading, not to mention it also has turbo save, so I guess you can use it pretty much anywhere. For the standard version, entry point that installs it is $CF00 but the code starts at $CA00, where drive routines start. At $CDB0 there is a raster compare routine to sort out bad lines (so that you don't have to blank the screen) and the half-nybble transfer routine for transferring a byte from C64 to 1541. In $CEC0 there seems to be a similar routine for opposite direction. Similar routine for the drive seems to start at $CC00, before that there is something related to drive head movement.. This is just a quick glance at the disassembly with some omissions and probably some errors as well, but I hope this helps.
I wasn't able to quit looking at the turbo code.. Having a look at the drive memory wasn't enlightening to say at least, it was very confusing. I couldn't find code like I excepted in the drive mem. Also the turbo copies some data to cassette buffer which is then executed, but also modified by some other code. This way the turbo can load very long programs without crashing. Part of the code in cassette buffer seems to be for the drive, as there is a V-flag check followed by a read from $1C01! If I am able to solve the mystery I'll report about that 
Last edited by moonshine on Mon Aug 17, 2009 10:13 pm, edited 1 time in total.
-
Nightmaretony
- In Memoriam
- Posts: 618
- Joined: 27 Jun 2003
- Location: Meadowbrook
- Contact:
-
blackadder
- Posts: 24
- Joined: 20 Mar 2006
A worked with a guy that wrote some fast load software, it basically synced the drive and computer to eliminate the handshaking and transfered 2 bits at a time.
I later built a parallel cable from the unused 6522 port in the 1541 to the C64 user port so I could do transfers 8 bits at a time. To use it I modified his code, but I found that it wasn't noticably faster than the 2 bit version.
The routines weren't very complicated, I may still have the docs, I'll see if I can find them.
It only worked on reads, speeding up writes is apparently a bit more complicated.
I later built a parallel cable from the unused 6522 port in the 1541 to the C64 user port so I could do transfers 8 bits at a time. To use it I modified his code, but I found that it wasn't noticably faster than the 2 bit version.
The routines weren't very complicated, I may still have the docs, I'll see if I can find them.
It only worked on reads, speeding up writes is apparently a bit more complicated.
Interesting; GEOS is able to speed up both reads and writes. I suspect GEOS uses the same overall protocol for transferring data serially.
I have to wonder, though, if you intend on communicating with a normal IEC device, do you first have to unload/shut-off the disk drive code?
It seems to me like you should, since you're tweaking both clock and data alike.
I have to wonder, though, if you intend on communicating with a normal IEC device, do you first have to unload/shut-off the disk drive code?
It seems to me like you should, since you're tweaking both clock and data alike.
There are couple of articles at COVERT BITOPS site;
A simple diskdrive IRQ-loader dissected by Cadaver
http://cadaver.homeftp.net/rants/irqload.htm
and 2-bit transfer protocol in diskdrive IRQ-loaders by Cadaver
http://cadaver.homeftp.net/rants/2bitload.htm
A simple diskdrive IRQ-loader dissected by Cadaver
http://cadaver.homeftp.net/rants/irqload.htm
and 2-bit transfer protocol in diskdrive IRQ-loaders by Cadaver
http://cadaver.homeftp.net/rants/2bitload.htm
Slogan: Everything or nothing
Most drive turbos use normal bus protocol but for the main data transfer. I've never had any other devices in IEC bus than a 1541, but printer has a device number <8 and Star Turbo, for example, checks for this and falls back to KERNAL routines in this case, as well as in another special case from drive point of view, loading a directory. So in theory it works well with other IEC devices, although I'm not able to test it. Using a special protocol should not disturb other devices, as they are just passively listening on the bus.
Most of the common "speeders" use 2 bit transfer.
To the "edge detection" poster. You can't do edge detection on the lines, so timing is all you have.
Note also that the 1541 is 1MHz, the C64 is a bit less (PAL) and a bit more (NTSC). So, you have to resync after so many transfer times to get things back in sync.
on the C64, you have to wait until right after a VIC-II "badline" and make sure Sprites are off (they incur badlines on each line they are active) to ensure timing. That is why GEOS turns off the mouse point on disk access.
You do not need to necessarily turn off the drive code. For instance, JiffyDOS, one of the most popular US speeders, uses a "6th bit twiddle" on the DATA to signal that the computer is using JiffyDOS. If the drive code does not see that twiddle, it uses normal routines.
Under JiffyDOS, you can transfer 2 bits is about 6-7uS, so 24-28uS per byte. You need some additional handshaking to deal with clock skew, but 40uS is a good margin. The stock routine takes 40uS per bit on save, 120uS on receive. So, 320uS on send, 960uS on receive. Thus, 10-20x increase, depending on direction.
For commented C code, you can look at the sd2iec firmware, which supports stock and JiffyDOS protocols.
Jim
To the "edge detection" poster. You can't do edge detection on the lines, so timing is all you have.
Note also that the 1541 is 1MHz, the C64 is a bit less (PAL) and a bit more (NTSC). So, you have to resync after so many transfer times to get things back in sync.
on the C64, you have to wait until right after a VIC-II "badline" and make sure Sprites are off (they incur badlines on each line they are active) to ensure timing. That is why GEOS turns off the mouse point on disk access.
You do not need to necessarily turn off the drive code. For instance, JiffyDOS, one of the most popular US speeders, uses a "6th bit twiddle" on the DATA to signal that the computer is using JiffyDOS. If the drive code does not see that twiddle, it uses normal routines.
Under JiffyDOS, you can transfer 2 bits is about 6-7uS, so 24-28uS per byte. You need some additional handshaking to deal with clock skew, but 40uS is a good margin. The stock routine takes 40uS per bit on save, 120uS on receive. So, 320uS on send, 960uS on receive. Thus, 10-20x increase, depending on direction.
For commented C code, you can look at the sd2iec firmware, which supports stock and JiffyDOS protocols.
Jim
Thanks for your insight! I think that if your trackloader knows where your interrupts are called/sprites are displayed and does not load there, you can use sprites and interrupts with a 2-bit loader. I should do a routine to display some koala pictures + sprite scroller on the sideborders at some point when I've finished with my current stuff, and I would need to load those pictures when everything is running. I have a plan to do everything except the loader in an IRQ. My friend created some true koala gems (with a joystick, pixel by pixel) back in the times, but (alas) the pictures have never been released.. Now he's busy programming games for living but gave permission to use his stuff as I have all the time he doesn't have. I will probably use the available sources for irq-loader, just modifying it so that it will work with my own routines.
Yes, if you don't try to do IEC work when VIC-II cycle accesses will happen, you can transfer with abandon. Many demos transferred data during the VBI, for just this reason.
Now, the more esoteric speeders did 3 bit transfers, ATN/CLK/DATA but those required that all devices on the bus except the drive in use were turned off. ATN is used as a "command/data" indicator, and using it for data transfers with regular IEC peripherals was not possible.
Jim
Now, the more esoteric speeders did 3 bit transfers, ATN/CLK/DATA but those required that all devices on the bus except the drive in use were turned off. ATN is used as a "command/data" indicator, and using it for data transfers with regular IEC peripherals was not possible.
Jim