For program memory I'd use a simple serial flash, and a bootloader to copy from flash to RAM. That way you only need one parallel device. RAM access time is also lower, so it'll run faster. Programming the flash can be done from the FPGA, so you don't need a programmer. After boot, the block RAM with the bootloader can be reused as fast RAM.
Excellent suggestion. I know you've mentioned it before...
Now we know the Spartan 3AN series is not suitable (at least the version we can solder).
We need to solidify the FPGA, RAM, and serial FLASH for layout purposes.
For the FPGA, I can suggest 2 versions that do come in 100-pin QFP.
1) is the XC3S50 Spartan 3, $6-$10 from various sources. Not very dense, but should do the job nicely.
2) is the XC3S500E Startan3E, ~$30 from Avnet. It can fit alot, but may be overkill.
For top speed as BigEd has mentioned there's only one choice in 144-pin QFP, and that is the Spartan 6, -3 high speed grade ~$20 from Avnet.
If we decide that top speed is critical and decide to use the XC6SLX9, we will have ~36Kx16 of internal block RAM available. That's quite alot of space and we may not even need external SRAM?...
EDIT: In contrast, the XC3S500E has ~22Kx16 internal block RAM, and the XC3S50 has 4.5K, that pretty much eliminates the XC3S50. So there are really only 2 choices: A 100-pin QFP XC3S500E @$30 or a 144-pin QFP XC6LX9 @$20