Yes, but by simply allocating one more byte in string space when you store the string in string space, you can store the length after the string (the bit 7 limitation is based on the fact that Applesoft is in ROM and you can't very easily change how it stores strings). With the length it's easy (and, in general, faster) to move backwards in memory to the beginning of the string. Furthermore, if you always allocate a minimum of 4 bytes, you have room for length (1 byte), active string flag (1 bit), and pointer to variable space (2 bytes), and 1-2 character strings are no longer a special case. At a cost of one byte per string (for most strings, and only 3 bytes for the very smallest strings) you can have fast, O(n) garbage collection. GC will happen slightly more frequently, and not quite as many strings will fit in memory, but this seems like a reasonable trade-off to me. (Heck, even if I'm forgetting something it takes two or three bytes per string, it doesn't seem that bad to me.)
From a very, very quick look at the ProDOS GC, it appears to be (windowed) stop-and-copy GC. There's a GC window, which is the size of free memory minus whatever GC overhead there is. I'll ignore the overhead to keep the math simple. For a 1k window, copy all active strings in the top 1k of string space to free memory (between variable space and string space). Now the top 1k is free, so copy the active strings in free memory there. Now work on the next 1k window, copying the active strings to free memory, then from free memory to wherever the already-collected strings in the top 1k end. And so on for the rest of string space.
It appears that ProDOS triggers automatic GC whenever free space gets to be around 1-2k. So it's still O(n^2) (but if GC is manually triggered when free space >= string space, the window can be large enough to do this in one iteration, and it's O(n)), but finding the highest N strings is faster than finding the one highest string. One other downside is that it requires a reasonably sized window. If the goal is for EhBASIC (in ROM for this example) to be able to run with, say 4k of RAM for program, variables, and strings, then having a minimum GC window of, say 1k makes for an even tighter fit.
Again, by no means am I saying that there isn't a fair amount of work here, just that there seem to be at least a couple of feasible approaches to improved GC performance, if anyone wants to put in the time.