With a very quick look, the only thing that jumps out that's actually wrong is:
Code:
lda ACIA_STAT ; Get Status
and ACIA_RXF ; Rx full bit
and the same thing with "and ACIA_TXE." You need the "#" in there, otherwise you get an address and a ZP addressing mode, instead of a literal to AND the byte with.
Other things are just simplifications and streamlining. For example, the reset vector can be made to just point to MAIN instead of a place labeled RST_VEC that just has jmp MAIN, and the interrupt vectors could both go to the same RTI. Also:
Code:
lda $00
sta ACIA_STAT ; Reset the ACIA
can be replaced with STZ ACIA_STAT. The processor comes out of reset with D clear and I set anyway, so there's no need to CLD and SEI to start. You're initially loading X and A with 0 which is unnecessary because you then immediately proceed to load something else in, without using the 0.
Code:
jsr WAIT_6551 ; required Delay
rts
can be replaced with JMP WAIT_6551. Actually, you can go further than that, because since WAIT_6551 is next, you can also skip the JMP instruction and just include a comment that WAIT_6551 must remain next. Going even further, you're always done with A when you get there, so you can use A for a loop index in the wait routine and not push and pull Y.
Using my
structure macros, I would write,
Code:
MAIN: LDX #$FF
TXS
JSR INIZ_ACIA
CLI
MSG: LDX #0
BEGIN
LDA MSG_HDR,X
WHILE_NOT_ZERO
JSR PUTCH
INX
REPEAT
BEGIN
JSR GETCH
CMP #$0D
IF_EQ
JSR PUTCH
LDA #$0A
END_IF
JSR PUTCH
AGAIN
MSG_HDR: .BYTE "TLC-MBC Monitor v0.1 - 27/06/15", $0D, $0A, $00
;-------------
GETCH: BEGIN ; Read one char from ACIA.
UNTIL_BIT ACIA_STAT, 3, IS_SET ; (You could also name the bit, but it should
; return the bit number, not its value.)
LDA ACIA_DATA
RTS
;-------------
PUTCH: PHA
BEGIN
UNTIL_BIT ACIA_STAT, 4, IS_SET
PLA
STA ACIA_DATA ; WAIT_6551 must be next.
;-------------
WAIT_6551:
PHX
FOR_A $10, DOWN_TO, 0
FOR_X $6B, DOWN_TO, 0
NEXT_X
NEXT_A
PLX
RTS
;-------------
which results in slightly fewer bytes of assembler output.
If you want to use an interrupt to tell you when it's ok to feed another byte to the transmit register, you can use a VIA timer (T1 or T2), and then the processor can do something useful while waiting.