6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue Jun 04, 2024 10:36 am

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Tue Mar 07, 2017 5:34 pm 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
Probably one of the most sensitive tape fastloaders used for C64 games on tape was the Pavloda fastloader.

In fact some of the games that used this loader even suggested keeping the C64 away from the TV.

Even when emulating the loading of a game using Pavloda is a tricky task. It envolves getting CIA timing right exactly together with Vic II raster/bad line timing. I think to the list one probably also need to add accurate emulation of tape pulse timing.

In my emulator I think I have waxed the CIA timing with the help of Matt's video on emulating the Beep in JavaScript.

I also more or less waxed the Vic II timing, but I am still missing something.

Does anyone perhaps know of any articles explaining the technicalities of the Pavloda fast loader system?


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 07, 2017 6:02 pm 
Offline

Joined: Sun Jul 28, 2013 12:59 am
Posts: 235
I'm not particularly familiar with C64 emulation itself, or any specific piece of software for it, but just from a general emulation perspective, some plausibilities come to mind:
  1. Hardware events that need to happen on a sub-instruction cycle boundary, which can have definite implications for CPU core design.
  2. Variable length cycles for whatever reason.
  3. Read of unmapped address space, to obtain the last value on some bus held by the bus capacitance.
  4. DMA type cycle stealing.
You're probably already handling at least some of these, but possibly not all. Another angle would be to disassemble the loader yourself to figure out what it's doing with the hardware.
The bit about keeping the computer away from the TV, back in the days when the TVs were particle accelerators with electron beam steering, suggests to me that something in terms of electromagnetic interference inducing a state change to an undriven bus could be critical.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 07, 2017 7:35 pm 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
Thx for the pointers nyef.

I have indeed gone the route of dissembling the code of the loader and try to see what it does. But, it loop a large number of times taking different paths. So it is quite difficult to follow.

I think perhaps take some time and share what I have found so far. Maybe someone can spot the missing puzzle piece :-)


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 07, 2017 11:15 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
There are 2 different versions of the CIA (6526 and 6526A), that differ in a 1 cycle phase shift regarding the timers. Both were used in various production runs of the C64. That could be the source of some problems.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 08, 2017 6:47 am 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
That is interesting.

Do you perhaps know if these C64's fitted with different CIA's caused issues loading some games?


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 08, 2017 10:17 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
I'm really unfamiliar with tape loaders. None of the disk loaders I'm aware of used CIA timers; they all did manual handshaking on the bus as fast as they could.

As the timers are probably the only difference between those 2 versions, I needed to detect them to do timer-based raster interrupts, but that's about it.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 08, 2017 11:24 am 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
Here is my findings so far.

First, a snippet of the key assembly code:

Code:
;start tape motor
.02AF 202 034   64523752  A9 1F       LDA #$1F   fe0108ee
.02B1 202 036   64523754  85 01       STA $01    1f0108ee
.02B3 202 039   64523757  A0 00       LDY #$00   1f0108ee
.02B5 202 041   64523759  8C F6 02    STY $02F6  1f0100ee
.02B8 202 045   64523763  8C 05 DD    STY $DD05  1f0100ee
.02BB 202 049   64523767  8C 07 DD    STY $DD07  1f0100ee
.02BE 202 053   64523771  8C 06 DD    STY $DD06  1f0100ee
.02C1 202 057   64523775  A9 5A       LDA #$5A   1f0100ee
.02C3 202 059   64523777  8D 04 DD    STA $DD04  5a0100ee
.02C6 203 000   64523781  20 4A 03    JSR $034A  5a0100ee

;subroutine
==========================
.0128 288 041   67575857  A5 A5       LDA $A5    001200ea
.012A 288 044   67575860  D0 24       BNE $0150  c81200ea
.012C 178 005   67588547  A9 10       LDA #$10   000100ea
.012E 287 001   67575754  2C 0D DC    BIT $DC0D  100100ea
.0131 287 005   67575758  F0 FB       BEQ $012E  100100ea ; loop till end off pulse
.0133 287 007   67575760  AD 06 DD    LDA $DD06  100100ea
.0136 287 011   67575764  4A          LSR A      ff0100ea
.0137 287 013   67575766  AA          TAX        7f0100ea
.0138 287 015   67575768  BD F9 03    LDA $03F9,X 7f7f00ea ; lookup new value to store in timer
.013B 287 020   67575773  8D 06 DD    STA $DD06  207f00ea
.013E 287 024   67575777  A9 51       LDA #$51   207f00ea
.0140 287 026   67575779  8D 0E DD    STA $DD0E  517f00ea
.0143 287 030   67575783  8D 0F DD    STA $DD0F  517f00ea
.0146 287 034   67575787  BD ED 02    LDA $02ED,X 517f00ea
.0149 287 039   67575792  85 A4       STA $A4    dd7f00ea  ; value to rotate into A3
.014B 287 042   67575795  BD F1 02    LDA $02F1,X dd7f00ea
.014E 287 047   67575800  85 A5       STA $A5    c97f00ea  ; lookup new value to store for looping
.0150 287 050   67575803  C6 A5       DEC $A5    c97f00ea
.0152 287 055   67575808  46 A4       LSR $A4    c97f00ea
.0154 287 060   67575813  26 A3       ROL $A3    c97f00ea
.0156 288 002   67575818  60          RTS        c97f00ea
===================================
.034A 288 020   67575836  A9 2C       LDA #$2C   c91200ec
.034C 288 022   67575838  8D EA 03    STA $03EA  2c1200ec ; change opcode to bit
.034F 288 026   67575842  A9 00       LDA #$00   2c1200ec
.0351 288 028   67575844  8D 20 D0    STA $D020  001200ec
.0354 288 032   67575848  85 9E       STA $9E    001200ec
.0356 288 035   67575851  20 28 01    JSR $0128  001200ec
.0359 288 008   67575824  A6 A3       LDX $A3    c97f00ec
.035B 288 011   67575827  E8          INX        c91100ec
.035C 288 013   67575829  F0 E7       BEQ $0345  c91200ec
.035E 288 015   67575831  E0 02       CPX #$02   c91200ec
.0360 288 017   67575833  D0 E8       BNE $034A  c91200ec
.0362 140 035   68333111  A6 9E       LDX $9E    010200ec
.0364 140 038   68333114  D0 E4       BNE $034A  010000ec
.0366 140 040   68333116  20 E3 03    JSR $03E3  010000ec ; Update $9E
.0369 189 056   68336219  C9 66       CMP #$66   666600ec ; and check returned value in Acc
.036B 189 058   68336221  D0 DD       BNE $034A  666600ec ; jump back
.036D 189 060   68336223  20 E3 03    JSR $03E3  666600ec
.0370 233 039   68338974  C9 1B       CMP #$1B   1b1b00ec ; check returned value
.0372 233 041   68338976  D0 D6       BNE $034A  1b1b00ec ; jump back
.0374 233 043   68338978  A9 EE       LDA #$EE   1b1b00ec
.0376 233 045   68338980  8D EA 03    STA $03EA  ee1b00ec ; change opcode to INC
.0379 233 049   68338984  A9 00       LDA #$00   ee1b00ec
.037B 233 051   68338986  85 9E       STA $9E    001b00ec
.037D 233 054   68338989  20 E3 03    JSR $03E3  001b00ec
.0380 277 017   68341724  CD F5 02    CMP $02F5  000000ec
.0383 277 021   68341728  F0 04       BEQ $0389  000000ec
===================================
;subroutine
.03E3 138 042   68391960  A9 01       LDA #$01   2c2cf4ea
.03E5 138 044   68391962  85 A3       STA $A3    012cf4ea
.03E7 138 047   68391965  20 28 01    JSR $0128  012cf4ea
.03EA 185 037   67589020  2C 20 D0    BIT $D020  9d0000ea
.03ED 185 041   67589024  90 F8       BCC $03E7  9d0000ea
.03EF 190 051   67589349  A6 A3       LDX $A3    960000ea
.03F1 190 054   67589352  38          SEC        960000ea
.03F2 190 056   67589354  8A          TXA        960000ea
.03F3 190 058   67589356  65 9E       ADC $9E    000000ea
.03F5 190 061   67589359  85 9E       STA $9E    010000ea
.03F7 191 001   67589362  8A          TXA        010000ea
.03F8 191 003   67589364  60          RTS        000000ea


I got this source code via a raw trace from Vice, so don't worry about the extra numbers.

We start at address 2AF where we switch on the tape motor. At this point in time the tape head is positioned at the beginning of a 3 second pulse.

Eventually the subroutine at 128 will get called, which will spend most of its time waiting till we are passed the 3 second pulse.

It might not be obvious from the code, but the tape loader makes use of timer A and B from CIA 2.

Timer A is set to always count 90 o2 pulses before underflow.

Timer B is set to count Timer A underflows. How much Timer B counts before underflow is always set at the end of a tape pulse via a lookup table. The value of timer B at the end of the pulse is used as index into the lookup table.

It should be noted that the first time we get a value from timer B, we are dependant on initial startup conditions of the C64. Therefore the first value we will get from timer B will always be $ff. In the lookup table this will translate to a new timer value for timer B of $20.

If you go through the code, you will see the use of two lookup tables. I am listing them here:

Code:
03f0 a3 38 8a 65 9e 85 9e 8a 60 08 0a 08 0a 00 00 00
0400 0c 0f 01 04 20 20 20 20 20 20 20 20 20 20 20 20
0410 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0420 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0430 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0440 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0450 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0460 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0470 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0480 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
0490 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04a0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04b0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04c0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04d0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
04e0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

02e0 68 8d 20 d0 a9 07 85 c0 a6 ae a4 af 60 02 00 01
02f0 00 02 02 01 01 00 00 a4 9e 20 e3 03 c4 a3 60 00
0300 d1 03 83 a4 7c a5 1a a7 e4 a7 86 ae 00 00 00 00
0310 4c 48 b2 00 31 ea c1 fe c1 fe 4a f3 91 f2 0e f2
0320 50 f2 33 f3 57 f1 ca f1 b1 f1 3e f1 2f f3 66 fe
0330 a7 02 a2 01 8e 20 d0 a9 3f 85 01 a9 10 24 01 f0
0340 fc a9 1f 85 01 a2 01 20 38 01 a9 2c 8d ea 03 a9
0350 00 8d 20 d0 85 9e 20 28 01 a6 a3 e8 f0 e7 e0 02
0360 d0 e8 a6 9e d0 e4 20 e3 03 c9 66 d0 dd 20 e3 03
0370 c9 1b d0 d6 a9 ee 8d ea 03 a9 00 85 9e 20 e3 03
0380 cd f5 02 f0 04 90 c3 b0 a9 20 e3 03 cd f6 02 f0
0390 04 90 b7 b0 9d a8 f0 14 a0 00 20 e3 03 91 ae ea
03a0 c8 d0 f7 20 f7 02 d0 a2 e6 af ea 60 20 e3 03 85
03b0 ae 20 e3 03 85 af 20 e3 03 85 15 20 e3 03 85 14
03c0 20 f7 02 f0 03 4c 4a 03 a9 00 85 9e a4 14 4c 9a
03d0 03 8a 10 03 4c 8b e3 99 e3 03 c8 d0 fa ee d9 03
03e0 4c d7 03 a9 01 85 a3 20 28 01 2c 20 d0 90 f8 a6
03f0 a3 38 8a 65 9e 85 9e 8a 60 08 0a 08 0a 00 00 00

Another interesting happening is the modification of the opcode at 3EA.


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 09, 2017 11:51 pm 
Offline

Joined: Sun Sep 18, 2016 9:33 pm
Posts: 3
Just in case, here's a recently found Pavloda tape mastering disk.

Attachment:
Pavloda.zip [9.31 KiB]
Downloaded 223 times


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 10, 2017 3:45 pm 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
Very interesting!

Thanks for the disk images!


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 14, 2017 5:40 pm 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
I have kind of made some progress to account for missing cycles in my emulation.

Most of the missing cycles was small bugs in my VIC-II cycle emulation.

Still at the VIC-II cycle emulation, however, I am battling to get the emulation of bad-lines perfect.

I am working through a VICE trace log as reference and I am not sure how VICE get to bad line figure.

From quite a number of sources on the net I found that badlines is 40 cycles plus 1 to 3 cycles to accommodate possible writes of a CPU instructions. So, in effect a bad line can be between 40-43 cycles. In the majority of cases a bad line will have the length of 43 cycles.

I am adding a couple of VICE trace log extracts for badlines. It would be great if someone can perhaps explain cycle by cycle how you would derive the bad line cycle length for each scenario. (PS. For those not familiar with the VICE trace log syntax, the second column is the current raster line at the beginning of instruction and third column is current cycle within that line, also for the coming instruction).

Here is two 43 cycle bad line examples:

Code:
.0128 051 003   65025264  A5 A5       LDA $A5    000100ea
.012A 051 006   65025267  D0 24       BNE $0150  7a0100ea
.0150 051 009   65025270  C6 A5       DEC $A5    7a0100ea
START 65025272 NUM 43 SUB 0 MAIN 65025275 DMAST 65025272
*** DMA VICII   65025272  43
.0152 051 057   65025318  46 A4       LSR $A4    7a0100ea
.0154 051 062   65025323  26 A3       ROL $A3    7a0100ea
===============================================================
.034F 059 007   65025772  A9 00       LDA #$00   2c0100ec
.0351 059 009   65025774  8D 20 D0    STA $D020  000100ec
START 65025776 NUM 43 SUB 0 MAIN 65025777 DMAST 65025776
*** DMA VICII   65025776  43
.0354 059 056   65025821  85 9E       STA $9E    000100ec
.0356 059 059   65025824  20 28 01    JSR $0128  000100ec


and here is two 41 cycle examples

Code:
.012A 099 000   65028285  D0 24       BNE $0150  4c0100ea
.0150 099 003   65028288  C6 A5       DEC $A5    4c0100ea
.0152 099 008   65028293  46 A4       LSR $A4    4c0100ea
START 65028296 NUM 41 SUB 2 MAIN 65028298 DMAST 65028298
*** DMA VICII   65028296  41
.0154 099 054   65028339  26 A3       ROL $A3    4c0100ea
.0156 099 059   65028344  60          RTS        4c0100ea
.0359 100 002   65028350  A6 A3       LDX $A3    4c0100ec
=============================================================
.0351 107 001   65028790  8D 20 D0    STA $D020  000100ec
.0354 107 005   65028794  85 9E       STA $9E    000100ec
.0356 107 008   65028797  20 28 01    JSR $0128  000100ec
START 65028800 NUM 41 SUB 2 MAIN 65028803 DMAST 65028802
*** DMA VICII   65028800  41
.0128 107 055   65028844  A5 A5       LDA $A5    000100ea
.012A 107 058   65028847  D0 24       BNE $0150  440100ea
.0150 107 061   65028850  C6 A5       DEC $A5    440100ea



Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2017 7:24 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 672
1) Are you running PAL (63) or NTSC (64 or 65 cycles/line) timing?

2) Sprites also add extend badlines, depending on how many are active per scanline. Normally, the VIC-II pulls 8 bits per cycle on the opposite phase than the CPU for 40 font bytes per line, and an additional 3 (iirc) cycles per sprite. If there aren't enough cycles available, it will tack those extra requests in addition to a badline, taking both phases in order to get everything in in time.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2017 8:39 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10822
Location: England
Might be worth a look at the C source for VICE:
https://github.com/stuartcarnie/vice-em ... eal_cycles

It looks to me like the NUM which is observed to be 41 or 43 in the traces is calculated here:
https://github.com/stuartcarnie/vice-em ... line.c#L89

And, slightly to my surprise, it's open-loop - it does not depend on which cycle of which opcode the CPU is in. Although it's true that the CPU could be writing up to 3 bytes (taking an interrupt being the worst case) and therefore RDY could take up to 3 cycles to take effect, it sounds like that padding is just built into what the VICII does.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2017 12:01 pm 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
I work on the assumption of PAL timing. The Pavloda loader luckily doesn't enable enable any sprites while loading, so I don't need to worry as yet about throwing sprites into the timings equation as yet :)

@BieEdL Thanjs to the pointers at where in the Vice source I can get more info. I actually managed to find the exact place of interest within vicii/vicii-fetch.c:

Code:
void vicii_fetch_alarm_handler(CLOCK offset, void *data)
{
    CLOCK last_opcode_first_write_clk, last_opcode_last_write_clk;

    /* This kludgy thing is used to emulate the behavior of the 6510 when BA
       goes low.  When BA goes low, every read access stops the processor
       until BA is high again; write accesses happen as usual instead.  */

    if (offset > 0) {
        switch (OPINFO_NUMBER(last_opcode_info)) {
            case 0:
                /* In BRK, IRQ and NMI the 3rd, 4th and 5th cycles are write
                   accesses, while the 1st, 2nd, 6th and 7th are read accesses.  */
                last_opcode_first_write_clk = maincpu_clk - 5;
                last_opcode_last_write_clk = maincpu_clk - 3;
                break;

            case 0x20:
                /* In JSR, the 4th and 5th cycles are write accesses, while the
                   1st, 2nd, 3rd and 6th are read accesses.  */
                last_opcode_first_write_clk = maincpu_clk - 3;
                last_opcode_last_write_clk = maincpu_clk - 2;
                break;

            default:
                /* In all the other opcodes, all the write accesses are the last
                   ones.  */
                if (maincpu_num_write_cycles() != 0) {
                    last_opcode_last_write_clk = maincpu_clk - 1;
                    last_opcode_first_write_clk = maincpu_clk
                                                  - maincpu_num_write_cycles();
                } else {
                    last_opcode_first_write_clk = (CLOCK)0;
                    last_opcode_last_write_clk = last_opcode_first_write_clk;
                }
                break;
        }
    } else { /* offset <= 0, i.e. offset == 0 */
        /* If we are called with no offset, we don't have to care about write
           accesses.  */
        last_opcode_first_write_clk = last_opcode_last_write_clk = 0;
    }
   ..
    while (1) {
    ...
        if (vicii.fetch_clk < last_opcode_first_write_clk || vicii.fetch_clk > last_opcode_last_write_clk) {
            sub = 0;
        } else {
            sub = last_opcode_last_write_clk - vicii.fetch_clk + 1;
        }
   ...
    }


Think this will cater for most cases.

As matter of interest, I am just wondering about the following scenario.

If, for instance, you have the followoing instruction:

Quote:
DEC $A5


This instruction takes 5 clock cycles to complete. Now, if the BA signal gets asserted at cycle 5 , which is a CPU write cycle, the bad line will be 42 cycles instead of 43.

My question now is what will happen if the BA signal gets asserted at the time when the CPU is busy decrementing the value, e.g. no memory access (read or write) is required for the particular cycle. How will this effect the overall timing?


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 21, 2017 7:27 pm 
Offline

Joined: Wed Apr 27, 2016 2:15 pm
Posts: 141
Location: South Africa
Just to let you all know that I manage to load game that uses the pavloda loader within my emulator.

What an exercise!

Apart from the badline emulation, I was also missing things when emulating the cascaded configuration of the 6526 timers.

Cascaded configuration is where timer B counts timer A underflows. Here I was missing two things.

Firstly, I was only supposed to decrement timer B about three clock cycles after timer A underflowed.

Lastly, I also missed the fact that with cascaded configuration, timer B counts to zero and then do a reload of the timer. This is a bit different than in 02 where we only count down to one and then doing the reload.

For nostalgic value, here is some screenshots:


Attachments:
court.png
court.png [ 45.84 KiB | Viewed 4464 times ]
climb.png
climb.png [ 68.99 KiB | Viewed 4464 times ]
desert.png
desert.png [ 39.8 KiB | Viewed 4464 times ]
intro.png
intro.png [ 43.23 KiB | Viewed 4464 times ]
pavloda.png
pavloda.png [ 33.59 KiB | Viewed 4464 times ]
Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 21, 2017 10:28 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3363
Location: Ontario, Canada
It sounds as if this was a rather gnarly problem to solve. Congrats on your success! :)

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: