6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 8:34 pm

All times are UTC




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Tue Sep 29, 2020 6:38 pm 
Offline

Joined: Tue Jun 19, 2018 8:28 am
Posts: 122
Recently I've been trying to debug one problem with a 6502 based device with a code written in C and compiled with CC65. It was kind of weird issue because the same library that seemed to cause the problem worked fine in different project. Entire situation is described here. Yesterday I finally found out that it was and issue related to the stack overflow. I've able to work it around by reducing size of some buffers and making some other buffers and variables local. Now device ceased to reset itself, but there still are some bugs (for example quite rare false detection of button being pushed) which may or my not be caused by the same issue.

I am kind of bothered by the fact that there was thank kind of a problem in a first place. Device has 8kB SRAM chip, which is a lot for such system. There are two larger buffers, each 256 bytes long. Beside that there is only a couple of smaller buffers and global variables. There is nothing that could eat up all that RAM, leaving no space for CC65 software stack.

Is there any way to generate summarized information about RAM usage in cc65 compiled project. AVR-GCC provides such information regarding to flash (program size) and RAM (sum total size of all global variables). Can I get something similar here? Is there a way to diagnose and fix this problem?


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 29, 2020 10:37 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 9:02 pm
Posts: 1738
Location: Sacramento, CA
The main.map file that was generated should show you the memory usage somewhat. I had looked at that when you posted the "broken" code with the output files. I also looked at the data stack usage for the update_keys routine and could not find any issue with that. I was wondering if the hardware stack was getting corrupted or overflowing and RTS cmds were sending you to non-program code. a BRK will cause your code to restart. You may want to look at your hardware Interrupt sources to be sure its not overloading your system. you could rewrite the BRK handler to dump a string and stop vs. restarting, that way you'll know if that is the cause.

Daryl

_________________
Please visit my website -> https://sbc.rictor.org/


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 6:14 am 
Offline

Joined: Tue Jun 19, 2018 8:28 am
Posts: 122
8BIT wrote:
The main.map file that was generated should show you the memory usage somewhat.


Then there should be plenty of memory left...

Code:
Segment list:
-------------
Name                   Start     End    Size  Align
----------------------------------------------------
ZEROPAGE              000000  000019  00001A  00001
DATA                  000200  00024A  00004B  00001
BSS                   00024B  0004FF  0002B5  00001
STARTUP               00804B  008066  00001C  00001
ONCE                  008067  008072  00000C  00001
CODE                  008073  00A022  001FB0  00001
RODATA                00A023  00A1D6  0001B4  00001
VECTORS               00FFFA  00FFFF  000006  00001


Quote:
I had looked at that when you posted the "broken" code with the output files. I also looked at the data stack usage for the update_keys routine and could not find any issue with that.


I also spent a lot of time looking at that code and I am quite sure it is fine. It works with the other device...

Quote:
I was wondering if the hardware stack was getting corrupted or overflowing and RTS cmds were sending you to non-program code.


If there is some way to check if that is the case?

Quote:
a BRK will cause your code to restart. You may want to look at your hardware Interrupt sources to be sure its not overloading your system. you could rewrite the BRK handler to dump a string and stop vs. restarting, that way you'll know if that is the cause.


Ok, I will try it. What should I do next, If I'll find out that BRK was the cause of resets?


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 10:22 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Perhaps your interrupt handler could inspect the page 1 stack pointer and keep a kind of high water mark (which is to say, keep a record of the smallest value seen.)


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 11:53 am 
Offline

Joined: Tue Jun 19, 2018 8:28 am
Posts: 122
I try to optimize use od hardware stack, bu removing unnecessary nested subroutine calls. In previous version of my code I had such a call inside interrupt subroutine:
Code:
irq_chk_rtc:
           JSR _update_geiger_pulses ; goto the subroutine updating giger pulses


Which was calling this C function:

Code:
void update_geiger_pulses (void) {
   SEI();
   asm("lda $6482");
   asm("sta _tmp16+1");
   asm("lda $6483");
   asm("sta _tmp16");
   MC6840_TIMER1 = 0xFFFF;
   CLI();
    //cpm = 0xFFFF - tmp;
    geiger_pulses[geiger_ind] = 0xFFFF - tmp16;
    geiger_ind++;
    if (geiger_ind > 59) geiger_ind = 0;
}


Which compiled to following assembly code:

Code:
; ---------------------------------------------------------------
; void __near__ update_geiger_pulses (void)
; ---------------------------------------------------------------

.segment   "CODE"

.proc   _update_geiger_pulses: near

.segment   "CODE"

   sei
   lda     $6482
   sta     _tmp16+1
   lda     $6483
   sta     _tmp16
   lda     #$FF
   sta     $6482
   sta     $6482+1
   cli
   ldx     #$00
   lda     _geiger_ind
   asl     a
   bcc     L0004
   inx
   clc
L0004:   adc     #<(_geiger_pulses)
   sta     ptr1
   txa
   adc     #>(_geiger_pulses)
   sta     ptr1+1
   ldx     #$FF
   txa
   sec
   sbc     _tmp16
   pha
   txa
   sbc     _tmp16+1
   tax
   pla
   sta     (ptr1)
   ldy     #$01
   txa
   sta     (ptr1),y
   inc     _geiger_ind
   lda     _geiger_ind
   cmp     #$3C
   bcc     L0002
   stz     _geiger_ind
L0002:   rts

.endproc


I decided to write my own code into interrupt subroutine:

Code:
irq_chk_rtc:
           LDA M6242_STA         ; Load RTC status register
           AND #04                ; Check if IRQ flag is set
           BEQ irq_ret            ; If not, this is not RTC interrupt, co continue
           LDA #$00               ; Otherwise clear flag
           STA M6242_STA        ; We can do it here. We are still a second ahead next RTC int
           
           LDA MC6840_TIMER1     ; Load value from TIMER1
         STA _tmp16+1           ; Then save it to tmp16
         LDA MC6840_TIMER1+1
         STA _tmp16
         LDA #$FF              ; Write 0xFFFF to the TIMER1
         STA MC6840_TIMER1
         STA MC6840_TIMER1+1
         
         LDX _geiger_ind        ; Load the value of index to the X
           SEC                 ; Set carry out for borrow puropose
           LDA #$FF              ; We need to subtract tmp from 0xFFFF to get real anount of pulses
           SBC _tmp16           ; We subtract LSB first
           STA _geiger_pulses, X  ; Then we store the result, LSB first using X value as an index
           LDA #$FF
           SBC _tmp16+1
           STA _geiger_pulses+1, X
           INX                 ; Increment X
           CPX #$3C              ; Check if X<=60
           BCC irq_chk_rtc_svind  ; If not, just save current, incremented value from X
           LDX #$00              ; If yes, zero it out
irq_chk_rtc_svind:
           STX _geiger_ind
irq_chk_rtc_inc_uptime:           
           INC _uptime_value      ; And increment uptime variable
           BNE irq_ret
           INC _uptime_value+1
           BNE irq_ret
           INC _uptime_value+2
           BNE irq_ret
           INC _uptime_value+3


Unfortunately, now it shows rubbish instead of correct value. I can't see anything wrong there. Where I made a mistake?


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 12:08 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 9:02 pm
Posts: 1738
Location: Sacramento, CA
Your call to "_update_geiger_pulses" within your Interrupt service routine did have a problem. The CLI command would reset the IRQ Enable before the current interrupt routine was done. That could cause another IRQ to start before the last was finished. If too many came in, your hardware stack would overflow.

I will need more time to read over your new code - its good that there is no more CLI, but not good that it does not quite work as needed.

Daryl

_________________
Please visit my website -> https://sbc.rictor.org/


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 12:11 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 702
Location: North Tejas
Atlantis wrote:
I decided to write my own code into interrupt subroutine:

Code:
         LDX _geiger_ind        ; Load the value of index to the X
           SEC                 ; Set carry out for borrow puropose
           LDA #$FF              ; We need to subtract tmp from 0xFFFF to get real anount of pulses
           SBC _tmp16           ; We subtract LSB first
           STA _geiger_pulses, X  ; Then we store the result, LSB first using X value as an index
           LDA #$FF
           SBC _tmp16+1
           STA _geiger_pulses+1, X
           INX                 ; Increment X


Unfortunately, now it shows rubbish instead of correct value. I can't see anything wrong there. Where I made a mistake?


You are not multiplying geiger_ind by two (shifting left one bit) to account for geiger_pulses being an array of words and not bytes.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 12:25 pm 
Offline

Joined: Tue Jun 19, 2018 8:28 am
Posts: 122
BillG wrote:
You are not multiplying geiger_ind by two (shifting left one bit) to account for geiger_pulses being an array of words and not bytes.

[/quote]

Can you explain? I am not quite sure if I get it.


Last edited by Atlantis on Wed Sep 30, 2020 12:32 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 12:33 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 702
Location: North Tejas
Atlantis wrote:
BillG wrote:
You are not multiplying geiger_ind by two (shifting left one bit) to account for geiger_pulses being an array of words and not bytes.


Can you explain? I am not quite sure if I get it.


The compiled C code is doing this...

Code:
   lda     _geiger_ind
   asl     a
   bcc     L0004
   inx
   clc


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 12:45 pm 
Offline

Joined: Tue Jun 19, 2018 8:28 am
Posts: 122
BillG wrote:
The compiled C code is doing this...

Code:
   lda     _geiger_ind
   asl     a
   bcc     L0004
   inx
   clc


Sorry, I am still not sure how to include it into my code. I am still learning assembly and I has some difficulties with understanding all of compiled code. This was one of problematic parts.
Can you tell me how to use it in my code. Be aware of the fact, that I load _geiger_ind to X and then use X to address buffer in absolute indexing mode.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 12:58 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 702
Location: North Tejas
geiger_pulses is an array of 16-bit values. Do you understand that part?

As written, your code writes to _geiger_pulses and _geiger_pulses+1 the first time through (geiger_ind is 0.)

The second time through, geiger_ind is 1 and you write to _geiger_pulses+1 and geiger_pulses+2, overwriting the value in _geiger_pulses+1 from the first time.

You can do one of two things:

* Double geiger_ind before doing "sta _geiger_pulses,X" and "sta _geiger_pulses+1,X" but this makes incrementing it afterward more difficult. It is also a bit slower if that matters.

* Increment geiger_ind by doing "inx" twice, then comparing with $3C*2 when deciding whether to roll it back to 0.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 1:05 pm 
Offline

Joined: Tue Jun 19, 2018 8:28 am
Posts: 122
Ok, now I see how silly my mistake was. I kind of think slowly today. :)
I've been interpreting compiled code in a wrong way.

Is it fine now?

Code:
         LDA _geiger_ind        ; Load the value of index to the X
         ASL A
         TAX
         
           SEC                 ; Set carry out for borrow puropose
           LDA #$FF              ; We need to subtract tmp from 0xFFFF to get real anount of pulses
           SBC _tmp16           ; We subtract LSB first
           STA _geiger_pulses, X  ; Then we store the result, LSB first using X value as an index
           LDA #$FF
           SBC _tmp16+1
           STA _geiger_pulses+1, X
           LDA _geiger_ind        ; Load _geiger_ind to A
           INC A
           CMP #$3C              ; Check if A<=60
           BCC irq_chk_rtc_svind  ; If not, just save current, incremented value from X
           LDA #$00              ; If yes, zero it out
irq_chk_rtc_svind:
           STA _geiger_ind


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 30, 2020 1:08 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 702
Location: North Tejas
Atlantis wrote:
Ok, now I see how silly my mistake was. I kind of think slowly today. :)
I've been interpreting compiled code in a wrong way.

Is it fine now?


Actually, I would take what you had and do this:

Code:
           STA _geiger_pulses+1, X
           INX                 ; Increment X
           INX                 ; Increment X
           CPX #$3C*2              ; Check if X<=60*2
           BCC irq_chk_rtc_svind  ; If not, just save current, incremented value from X
           LDX #$00              ; If yes, zero it out
irq_chk_rtc_svind:
           STX _geiger_ind


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 01, 2020 2:14 am 
Offline

Joined: Sun May 13, 2018 5:49 pm
Posts: 251
Atlantis wrote:
Yesterday I finally found out that it was and issue related to the stack overflow. I've able to work it around by reducing size of some buffers and making some other buffers and variables local. Now device ceased to reset itself, but there still are some bugs (for example quite rare false detection of button being pushed) which may or my not be caused by the same issue.
I'm glad to hear you located the issue. A stack overflow makes sense for global variables being overwritten. Many compilers place the stack right after the global variables and so this isn't as uncommon an issue as one might like.
Atlantis wrote:
I am kind of bothered by the fact that there was thank kind of a problem in a first place. Device has 8kB SRAM chip, which is a lot for such system.
You determined the stack size in your lib/ethergeiger.cfg file when you said:
Code:
__STACKSIZE__:  value = $0200, type = weak;
The compiler likely looked at where your last global variable was placed, added the $200 you requested for the stack and started the stack there. Because the stack grows down towards lower addresses, once it got too full it started to overwrite the global variables. Unfortunately there isn't really any collision detection built in, so it's something you will need to watch for. Global variables that mysteriously change are indeed one of the symptoms. BigEd's suggestion of having your ISR keep tabs on it is not unreasonable. If you were able to simulate your program (difficult because of how much hardware is involved) you can put a breakpoint on a write to the last (lowest) byte in the stack so you know if it ever grows that large.
Atlantis wrote:
Is there any way to generate summarized information about RAM usage in cc65 compiled project. AVR-GCC provides such information regarding to flash (program size) and RAM (sum total size of all global variables). Can I get something similar here? Is there a way to diagnose and fix this problem?
Kind of. One issue is the way you are compiling your program. Your Makefile is compiling your .c files into .s files, then you are using the assembler to assemble your .s files into .o files, and then finally you link all of them together. The issue with this method of assembling your executables is that you are losing symbol information along the way. You can get some of the symbols by adding the -g command line option to ca65 when assembling, but I wasn't able to get them to flow all the way to the map or listing file. Your better option might be to use cl65 which knows how to compile and (optionally) link files and can turn a .c file directly into a .o file (including the variables). I modified your Makefile like so (only main.c shown):
Code:
main.o: main.c
   cl65 --cpu 65c02 -g -c main.c
The -c option tells cl65 to compile but not link (so it will make a .o file) and the -g tells it to include extra debugging information (including the variable names). I also added "-Ln main.lst" to your a.out target's command line to create a list file. This is the file where the value (address) for a given symbol (variable or function name) will appear. Looking for your key variables, I find:
Code:
grep key *.lst
al 00880F ._key_init
al 008856 ._key_update
al 0092AC ._key3_func
al 009290 ._key2_func
al 009274 ._key1_func
al 009258 ._key0_func
al 0004FC ._key3
al 0004F7 ._key2
al 0004F2 ._key1
al 0004ED ._key0
and so I would be looking for writes to $04ED through $0500 (your key_t structures are 5 bytes big and this only shows the starting address). Unfortunately, it's really up to the application engineer to make sure the stack size is reasonable for the application, and I suspect there are a good number of devices that occasionally trash some of their variables due to a stack overflow that occurs on rare conditions and wasn't caught during development. Interrupts that happen during your ISR (it looks like your original ISR might be re-enabling interrupts before it finishes) are a very quick way to eat through a stack of any size, as 8BIT noted. Hopefully your rewritten ISR fixes that.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 48 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: