In detail, I count all the cycles per screen. I use the raster interrupt on the same line to start and finish the procedure. The main code is simply a sequence of NOPs. The raster interrupt handler takes the address of the next NOP from the stack and subtracts the address of the first NOP from this value. I then multiply the difference by two to get the number of cycles spent on the main code. All that remains is to add the number of cycles spent on the interrupt handler itself. However, this value is always two cycles less than it should be according to the technical data.
I made my code for Commodore+4 and Commodore 64. I can make the code for other architectures that support raster interrupts if it helps to solve the problem. Let me present the code for the C64. It is available on github. There are two small programs there.
cycle-counter-0 is just skeleton code. You need a debugger to work with it but it briefly shows the main idea.
cycle-counter uses that skeleton to get a more advanced program that prints the number of cycles.
Using VICE I get 18520+59=18579 cycles on the PAL machine instead of 18581 (287*63 + 25*20 - normal + bad lines). I get 15959+59=16018 cycles on the NTSC machine instead of 16020 (238*65 + 25*22). Actually the program shows 2 numbers 15958 and 15960 that gives 15959 on average.
The sources for the C+4 is here.
Many thanks in advance. Are there similar programs available elsewhere?