Congratulations on a good start. I see certain good traits in your code that beginners often violate and end up with code that's very difficult to figure out later to fix or modify. You have for example blank lines that make it easier to mentally separate groups of lines that serve as function entities, and use variable and constant names instead of unnecessarily mixing raw number in with the code. [Edit-- Oops, I see now that you said you have a ton of software-development experience, just not in 6502. I won't re-write the following-- Just take it for whatever it's worth.]
You're apparently looking for constructive criticism. Without knowing more about your end goal, I can only give minor criticisms to make the code more time- and memory-efficient. Tight code is easier to manage.
You have for example in subroutine PAUSE, PHA and PLA; but you don't use A, so there's no reason to save it and then restore it again.
As Memblers said, there's never any need to use CPY #0 immediately after LDY, INY, DEY, TAY, PLY, etc.. It's redundant. The CPY #0 already happens automatically as part of these other instructions.
Your NOP NOP NOP NOP which you're using for 8 clocks' delay could be shortened to PHY PLY with only two lines and two bytes if 7 clocks is ok. I know-- picky, picky. If it came up at least two or three times, it would be worth making a macro called something like NOP4, so it would only take one line per occurrence and eliminate some distraction from viewing the goal of the routine.
DRAWBALL only pushes and pulls A, but also uses X and Y. If you want to keep X and Y unaffected, you could do the same thing you're doing, using the same number of lines, bytes, and clocks, by only using A. If, OTOH, you don't need to keep X or Y unaffected, you could use one of them instead of A and then omit the PHA and PLA.
There seem to only be four possible values for curdir. If you have a 65c02 and if you can make curdir go in even numbers, the CMP-BEQ sequence in MOVEBALL could be replaced with a jump table (although it would make more difference when you have more comparisons). You could replace those 9 lines with
Code:
LDX curdir
JMP (table-2, X)
and the table before or after the routine,
Code:
MB_JMP: .WORD DIR1, DIR2, DIR3, DIR4
(The "-2" above is because you're not using a "0" value .)
Again if you have a 65c02, you can replace the short JMPs with the BRA instruction, saving a byte on each. If you only have an NMOS 65c02, in many of your cases the value of a flag will be known from an LDX or other recent instrucion, so you could BPL or something like that and know that the branch will always be taken. (This would be more important when memory gets tight. If it comes down to that, be sure the comments tell why the BPL, BVC, or whatever is there. BRA on the other hand usually needs no explanation.)
If you have 65c02, your LDX #00, STX io_posy can be replaced with the single instruction STZ io_posy. You have a similar situation in DODRAW2, even though there's an STY in between.
In DRAWSIDES, the LDA #1, LDY #$01 could save a byte by changing the LDY #$01 to TAY since the 1 is already in A.
In DODRAW2, there are two occurrences of STA io_putc with nothing changing the value of A in between, and no labels in between to jump to; so the second one will always store the same value in io_putc that it already had... unless that location is not a variable but rather an I/O device that outputs another byte every time you store to that address.
I personally like to indent loops. This could be considered a matter of personal style, but I think it makes it easier to visually factor them at a glance.
This is just a quick look that I saw some minor areas for improvement in. Some people have the attitude that they don't matter; but to me is seems to be part of a mentality of keeping the code as concise and well documented as possible, which pays off big time when you get into programs that are thousands or tens of thousands of lines, where maintenance becomes hopeless if you haven't been careful.
At my last place of work, I was working on a project with another man in the late 1980's. He typed incessantly in his programming work. For a long time, I wondered what he could possibly be typing, going non-stop like that. After he left the company and the boss asked me to change how one of his routines worked, I looked through the code and found it. It was four pages of spaghetti. The first attack I made was like reducing an equation to lowest terms. "Ok, we already have a subroutine to do these three parts... This part and this part really should be a macro... Here are some redundant instructions..." etc.. By the time I was done, it was down to only half a page, took far less memory and ran faster, was much easier to read, and I finally could see what needed modifying. Little by little, I ended up having to re-write much of his code.