Page 1 of 2
Emulation: The Pavloda tape fastlooader
Posted: Tue Mar 07, 2017 5:34 pm
by fastgear
Probably one of the most sensitive tape fastloaders used for C64 games on tape was the Pavloda fastloader.
In fact some of the games that used this loader even suggested keeping the C64 away from the TV.
Even when emulating the loading of a game using Pavloda is a tricky task. It envolves getting CIA timing right exactly together with Vic II raster/bad line timing. I think to the list one probably also need to add accurate emulation of tape pulse timing.
In my emulator I think I have waxed the CIA timing with the help of Matt's video on emulating the Beep in JavaScript.
I also more or less waxed the Vic II timing, but I am still missing something.
Does anyone perhaps know of any articles explaining the technicalities of the Pavloda fast loader system?
Re: Emulation: The Pavloda tape fastlooader
Posted: Tue Mar 07, 2017 6:02 pm
by nyef
I'm not particularly familiar with C64 emulation itself, or any specific piece of software for it, but just from a general emulation perspective, some plausibilities come to mind:
- Hardware events that need to happen on a sub-instruction cycle boundary, which can have definite implications for CPU core design.
- Variable length cycles for whatever reason.
- Read of unmapped address space, to obtain the last value on some bus held by the bus capacitance.
- DMA type cycle stealing.
You're probably already handling at least some of these, but possibly not all. Another angle would be to disassemble the loader yourself to figure out what it's doing with the hardware.
The bit about keeping the computer away from the TV, back in the days when the TVs were particle accelerators with electron beam steering, suggests to me that something in terms of electromagnetic interference inducing a state change to an undriven bus could be critical.
Re: Emulation: The Pavloda tape fastlooader
Posted: Tue Mar 07, 2017 7:35 pm
by fastgear
Thx for the pointers nyef.
I have indeed gone the route of dissembling the code of the loader and try to see what it does. But, it loop a large number of times taking different paths. So it is quite difficult to follow.
I think perhaps take some time and share what I have found so far. Maybe someone can spot the missing puzzle piece

Re: Emulation: The Pavloda tape fastlooader
Posted: Tue Mar 07, 2017 11:15 pm
by White Flame
There are 2 different versions of the CIA (6526 and 6526A), that differ in a 1 cycle phase shift regarding the timers. Both were used in various production runs of the C64. That could be the source of some problems.
Re: Emulation: The Pavloda tape fastlooader
Posted: Wed Mar 08, 2017 6:47 am
by fastgear
That is interesting.
Do you perhaps know if these C64's fitted with different CIA's caused issues loading some games?
Re: Emulation: The Pavloda tape fastlooader
Posted: Wed Mar 08, 2017 10:17 am
by White Flame
I'm really unfamiliar with tape loaders. None of the disk loaders I'm aware of used CIA timers; they all did manual handshaking on the bus as fast as they could.
As the timers are probably the only difference between those 2 versions, I needed to detect them to do timer-based raster interrupts, but that's about it.
Re: Emulation: The Pavloda tape fastlooader
Posted: Wed Mar 08, 2017 11:24 am
by fastgear
Here is my findings so far.
First, a snippet of the key assembly code:
Code: Select all
;start tape motor
.02AF 202 034 64523752 A9 1F LDA #$1F fe0108ee
.02B1 202 036 64523754 85 01 STA $01 1f0108ee
.02B3 202 039 64523757 A0 00 LDY #$00 1f0108ee
.02B5 202 041 64523759 8C F6 02 STY $02F6 1f0100ee
.02B8 202 045 64523763 8C 05 DD STY $DD05 1f0100ee
.02BB 202 049 64523767 8C 07 DD STY $DD07 1f0100ee
.02BE 202 053 64523771 8C 06 DD STY $DD06 1f0100ee
.02C1 202 057 64523775 A9 5A LDA #$5A 1f0100ee
.02C3 202 059 64523777 8D 04 DD STA $DD04 5a0100ee
.02C6 203 000 64523781 20 4A 03 JSR $034A 5a0100ee
;subroutine
==========================
.0128 288 041 67575857 A5 A5 LDA $A5 001200ea
.012A 288 044 67575860 D0 24 BNE $0150 c81200ea
.012C 178 005 67588547 A9 10 LDA #$10 000100ea
.012E 287 001 67575754 2C 0D DC BIT $DC0D 100100ea
.0131 287 005 67575758 F0 FB BEQ $012E 100100ea ; loop till end off pulse
.0133 287 007 67575760 AD 06 DD LDA $DD06 100100ea
.0136 287 011 67575764 4A LSR A ff0100ea
.0137 287 013 67575766 AA TAX 7f0100ea
.0138 287 015 67575768 BD F9 03 LDA $03F9,X 7f7f00ea ; lookup new value to store in timer
.013B 287 020 67575773 8D 06 DD STA $DD06 207f00ea
.013E 287 024 67575777 A9 51 LDA #$51 207f00ea
.0140 287 026 67575779 8D 0E DD STA $DD0E 517f00ea
.0143 287 030 67575783 8D 0F DD STA $DD0F 517f00ea
.0146 287 034 67575787 BD ED 02 LDA $02ED,X 517f00ea
.0149 287 039 67575792 85 A4 STA $A4 dd7f00ea ; value to rotate into A3
.014B 287 042 67575795 BD F1 02 LDA $02F1,X dd7f00ea
.014E 287 047 67575800 85 A5 STA $A5 c97f00ea ; lookup new value to store for looping
.0150 287 050 67575803 C6 A5 DEC $A5 c97f00ea
.0152 287 055 67575808 46 A4 LSR $A4 c97f00ea
.0154 287 060 67575813 26 A3 ROL $A3 c97f00ea
.0156 288 002 67575818 60 RTS c97f00ea
===================================
.034A 288 020 67575836 A9 2C LDA #$2C c91200ec
.034C 288 022 67575838 8D EA 03 STA $03EA 2c1200ec ; change opcode to bit
.034F 288 026 67575842 A9 00 LDA #$00 2c1200ec
.0351 288 028 67575844 8D 20 D0 STA $D020 001200ec
.0354 288 032 67575848 85 9E STA $9E 001200ec
.0356 288 035 67575851 20 28 01 JSR $0128 001200ec
.0359 288 008 67575824 A6 A3 LDX $A3 c97f00ec
.035B 288 011 67575827 E8 INX c91100ec
.035C 288 013 67575829 F0 E7 BEQ $0345 c91200ec
.035E 288 015 67575831 E0 02 CPX #$02 c91200ec
.0360 288 017 67575833 D0 E8 BNE $034A c91200ec
.0362 140 035 68333111 A6 9E LDX $9E 010200ec
.0364 140 038 68333114 D0 E4 BNE $034A 010000ec
.0366 140 040 68333116 20 E3 03 JSR $03E3 010000ec ; Update $9E
.0369 189 056 68336219 C9 66 CMP #$66 666600ec ; and check returned value in Acc
.036B 189 058 68336221 D0 DD BNE $034A 666600ec ; jump back
.036D 189 060 68336223 20 E3 03 JSR $03E3 666600ec
.0370 233 039 68338974 C9 1B CMP #$1B 1b1b00ec ; check returned value
.0372 233 041 68338976 D0 D6 BNE $034A 1b1b00ec ; jump back
.0374 233 043 68338978 A9 EE LDA #$EE 1b1b00ec
.0376 233 045 68338980 8D EA 03 STA $03EA ee1b00ec ; change opcode to INC
.0379 233 049 68338984 A9 00 LDA #$00 ee1b00ec
.037B 233 051 68338986 85 9E STA $9E 001b00ec
.037D 233 054 68338989 20 E3 03 JSR $03E3 001b00ec
.0380 277 017 68341724 CD F5 02 CMP $02F5 000000ec
.0383 277 021 68341728 F0 04 BEQ $0389 000000ec
===================================
;subroutine
.03E3 138 042 68391960 A9 01 LDA #$01 2c2cf4ea
.03E5 138 044 68391962 85 A3 STA $A3 012cf4ea
.03E7 138 047 68391965 20 28 01 JSR $0128 012cf4ea
.03EA 185 037 67589020 2C 20 D0 BIT $D020 9d0000ea
.03ED 185 041 67589024 90 F8 BCC $03E7 9d0000ea
.03EF 190 051 67589349 A6 A3 LDX $A3 960000ea
.03F1 190 054 67589352 38 SEC 960000ea
.03F2 190 056 67589354 8A TXA 960000ea
.03F3 190 058 67589356 65 9E ADC $9E 000000ea
.03F5 190 061 67589359 85 9E STA $9E 010000ea
.03F7 191 001 67589362 8A TXA 010000ea
.03F8 191 003 67589364 60 RTS 000000ea
I got this source code via a raw trace from Vice, so don't worry about the extra numbers.
We start at address 2AF where we switch on the tape motor. At this point in time the tape head is positioned at the beginning of a 3 second pulse.
Eventually the subroutine at 128 will get called, which will spend most of its time waiting till we are passed the 3 second pulse.
It might not be obvious from the code, but the tape loader makes use of timer A and B from CIA 2.
Timer A is set to always count 90 o2 pulses before underflow.
Timer B is set to count Timer A underflows. How much Timer B counts before underflow is always set at the end of a tape pulse via a lookup table. The value of timer B at the end of the pulse is used as index into the lookup table.
It should be noted that the first time we get a value from timer B, we are dependant on initial startup conditions of the C64. Therefore the first value we will get from timer B will always be $ff. In the lookup table this will translate to a new timer value for timer B of $20.
If you go through the code, you will see the use of two lookup tables. I am listing them here:
Code: Select all
03f0 a3 38 8a 65 9e 85 9e 8a 60 08 0a 08 0a 00 00 00
0400 0c 0f 01 04 20 20 20 20 20 20 20 20 20 20 20 20
0410 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0420 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0430 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0440 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0450 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0460 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0470 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0480 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0490 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04a0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04b0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04c0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04d0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04e0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
02e0 68 8d 20 d0 a9 07 85 c0 a6 ae a4 af 60 02 00 01
02f0 00 02 02 01 01 00 00 a4 9e 20 e3 03 c4 a3 60 00
0300 d1 03 83 a4 7c a5 1a a7 e4 a7 86 ae 00 00 00 00
0310 4c 48 b2 00 31 ea c1 fe c1 fe 4a f3 91 f2 0e f2
0320 50 f2 33 f3 57 f1 ca f1 b1 f1 3e f1 2f f3 66 fe
0330 a7 02 a2 01 8e 20 d0 a9 3f 85 01 a9 10 24 01 f0
0340 fc a9 1f 85 01 a2 01 20 38 01 a9 2c 8d ea 03 a9
0350 00 8d 20 d0 85 9e 20 28 01 a6 a3 e8 f0 e7 e0 02
0360 d0 e8 a6 9e d0 e4 20 e3 03 c9 66 d0 dd 20 e3 03
0370 c9 1b d0 d6 a9 ee 8d ea 03 a9 00 85 9e 20 e3 03
0380 cd f5 02 f0 04 90 c3 b0 a9 20 e3 03 cd f6 02 f0
0390 04 90 b7 b0 9d a8 f0 14 a0 00 20 e3 03 91 ae ea
03a0 c8 d0 f7 20 f7 02 d0 a2 e6 af ea 60 20 e3 03 85
03b0 ae 20 e3 03 85 af 20 e3 03 85 15 20 e3 03 85 14
03c0 20 f7 02 f0 03 4c 4a 03 a9 00 85 9e a4 14 4c 9a
03d0 03 8a 10 03 4c 8b e3 99 e3 03 c8 d0 fa ee d9 03
03e0 4c d7 03 a9 01 85 a3 20 28 01 2c 20 d0 90 f8 a6
03f0 a3 38 8a 65 9e 85 9e 8a 60 08 0a 08 0a 00 00 00
Another interesting happening is the modification of the opcode at 3EA.
Re: Emulation: The Pavloda tape fastlooader
Posted: Thu Mar 09, 2017 11:51 pm
by ZrX
Just in case, here's a recently found Pavloda tape mastering disk.
Re: Emulation: The Pavloda tape fastlooader
Posted: Fri Mar 10, 2017 3:45 pm
by fastgear
Very interesting!
Thanks for the disk images!
Re: Emulation: The Pavloda tape fastlooader
Posted: Tue Mar 14, 2017 5:40 pm
by fastgear
I have kind of made some progress to account for missing cycles in my emulation.
Most of the missing cycles was small bugs in my VIC-II cycle emulation.
Still at the VIC-II cycle emulation, however, I am battling to get the emulation of bad-lines perfect.
I am working through a VICE trace log as reference and I am not sure how VICE get to bad line figure.
From quite a number of sources on the net I found that badlines is 40 cycles plus 1 to 3 cycles to accommodate possible writes of a CPU instructions. So, in effect a bad line can be between 40-43 cycles. In the majority of cases a bad line will have the length of 43 cycles.
I am adding a couple of VICE trace log extracts for badlines. It would be great if someone can perhaps explain cycle by cycle how you would derive the bad line cycle length for each scenario. (PS. For those not familiar with the VICE trace log syntax, the second column is the current raster line at the beginning of instruction and third column is current cycle within that line, also for the coming instruction).
Here is two 43 cycle bad line examples:
Code: Select all
.0128 051 003 65025264 A5 A5 LDA $A5 000100ea
.012A 051 006 65025267 D0 24 BNE $0150 7a0100ea
.0150 051 009 65025270 C6 A5 DEC $A5 7a0100ea
START 65025272 NUM 43 SUB 0 MAIN 65025275 DMAST 65025272
*** DMA VICII 65025272 43
.0152 051 057 65025318 46 A4 LSR $A4 7a0100ea
.0154 051 062 65025323 26 A3 ROL $A3 7a0100ea
===============================================================
.034F 059 007 65025772 A9 00 LDA #$00 2c0100ec
.0351 059 009 65025774 8D 20 D0 STA $D020 000100ec
START 65025776 NUM 43 SUB 0 MAIN 65025777 DMAST 65025776
*** DMA VICII 65025776 43
.0354 059 056 65025821 85 9E STA $9E 000100ec
.0356 059 059 65025824 20 28 01 JSR $0128 000100ec
and here is two 41 cycle examples
Code: Select all
.012A 099 000 65028285 D0 24 BNE $0150 4c0100ea
.0150 099 003 65028288 C6 A5 DEC $A5 4c0100ea
.0152 099 008 65028293 46 A4 LSR $A4 4c0100ea
START 65028296 NUM 41 SUB 2 MAIN 65028298 DMAST 65028298
*** DMA VICII 65028296 41
.0154 099 054 65028339 26 A3 ROL $A3 4c0100ea
.0156 099 059 65028344 60 RTS 4c0100ea
.0359 100 002 65028350 A6 A3 LDX $A3 4c0100ec
=============================================================
.0351 107 001 65028790 8D 20 D0 STA $D020 000100ec
.0354 107 005 65028794 85 9E STA $9E 000100ec
.0356 107 008 65028797 20 28 01 JSR $0128 000100ec
START 65028800 NUM 41 SUB 2 MAIN 65028803 DMAST 65028802
*** DMA VICII 65028800 41
.0128 107 055 65028844 A5 A5 LDA $A5 000100ea
.012A 107 058 65028847 D0 24 BNE $0150 440100ea
.0150 107 061 65028850 C6 A5 DEC $A5 440100ea
Re: Emulation: The Pavloda tape fastlooader
Posted: Wed Mar 15, 2017 7:24 am
by White Flame
1) Are you running PAL (63) or NTSC (64 or 65 cycles/line) timing?
2) Sprites also add extend badlines, depending on how many are active per scanline. Normally, the VIC-II pulls 8 bits per cycle on the opposite phase than the CPU for 40 font bytes per line, and an additional 3 (iirc) cycles per sprite. If there aren't enough cycles available, it will tack those extra requests in addition to a badline, taking both phases in order to get everything in in time.
Re: Emulation: The Pavloda tape fastlooader
Posted: Wed Mar 15, 2017 8:39 am
by BigEd
Might be worth a look at the C source for VICE:
https://github.com/stuartcarnie/vice-em ... eal_cycles
It looks to me like the NUM which is observed to be 41 or 43 in the traces is calculated here:
https://github.com/stuartcarnie/vice-em ... line.c#L89
And, slightly to my surprise, it's open-loop - it does not depend on which cycle of which opcode the CPU is in. Although it's true that the CPU could be writing up to 3 bytes (taking an interrupt being the worst case) and therefore RDY could take up to 3 cycles to take effect, it sounds like that padding is just built into what the VICII does.
Re: Emulation: The Pavloda tape fastlooader
Posted: Wed Mar 15, 2017 12:01 pm
by fastgear
I work on the assumption of PAL timing. The Pavloda loader luckily doesn't enable enable any sprites while loading, so I don't need to worry as yet about throwing sprites into the timings equation as yet
@BieEdL Thanjs to the pointers at where in the Vice source I can get more info. I actually managed to find the exact place of interest within vicii/vicii-fetch.c:
Code: Select all
void vicii_fetch_alarm_handler(CLOCK offset, void *data)
{
CLOCK last_opcode_first_write_clk, last_opcode_last_write_clk;
/* This kludgy thing is used to emulate the behavior of the 6510 when BA
goes low. When BA goes low, every read access stops the processor
until BA is high again; write accesses happen as usual instead. */
if (offset > 0) {
switch (OPINFO_NUMBER(last_opcode_info)) {
case 0:
/* In BRK, IRQ and NMI the 3rd, 4th and 5th cycles are write
accesses, while the 1st, 2nd, 6th and 7th are read accesses. */
last_opcode_first_write_clk = maincpu_clk - 5;
last_opcode_last_write_clk = maincpu_clk - 3;
break;
case 0x20:
/* In JSR, the 4th and 5th cycles are write accesses, while the
1st, 2nd, 3rd and 6th are read accesses. */
last_opcode_first_write_clk = maincpu_clk - 3;
last_opcode_last_write_clk = maincpu_clk - 2;
break;
default:
/* In all the other opcodes, all the write accesses are the last
ones. */
if (maincpu_num_write_cycles() != 0) {
last_opcode_last_write_clk = maincpu_clk - 1;
last_opcode_first_write_clk = maincpu_clk
- maincpu_num_write_cycles();
} else {
last_opcode_first_write_clk = (CLOCK)0;
last_opcode_last_write_clk = last_opcode_first_write_clk;
}
break;
}
} else { /* offset <= 0, i.e. offset == 0 */
/* If we are called with no offset, we don't have to care about write
accesses. */
last_opcode_first_write_clk = last_opcode_last_write_clk = 0;
}
..
while (1) {
...
if (vicii.fetch_clk < last_opcode_first_write_clk || vicii.fetch_clk > last_opcode_last_write_clk) {
sub = 0;
} else {
sub = last_opcode_last_write_clk - vicii.fetch_clk + 1;
}
...
}
Think this will cater for most cases.
As matter of interest, I am just wondering about the following scenario.
If, for instance, you have the followoing instruction:
This instruction takes 5 clock cycles to complete. Now, if the BA signal gets asserted at cycle 5 , which is a CPU write cycle, the bad line will be 42 cycles instead of 43.
My question now is what will happen if the BA signal gets asserted at the time when the CPU is busy decrementing the value, e.g. no memory access (read or write) is required for the particular cycle. How will this effect the overall timing?
Re: Emulation: The Pavloda tape fastlooader
Posted: Tue Mar 21, 2017 7:27 pm
by fastgear
Just to let you all know that I manage to load game that uses the pavloda loader within my emulator.
What an exercise!
Apart from the badline emulation, I was also missing things when emulating the cascaded configuration of the 6526 timers.
Cascaded configuration is where timer B counts timer A underflows. Here I was missing two things.
Firstly, I was only supposed to decrement timer B about three clock cycles after timer A underflowed.
Lastly, I also missed the fact that with cascaded configuration, timer B counts to zero and then do a reload of the timer. This is a bit different than in 02 where we only count down to one and then doing the reload.
For nostalgic value, here is some screenshots:
Re: Emulation: The Pavloda tape fastlooader
Posted: Tue Mar 21, 2017 10:28 pm
by Dr Jefyll
It sounds as if this was a rather gnarly problem to solve. Congrats on your success!
