DRAMs are organized internally as a square matrix. Take the 4164 DRAM chip for example, which is 64K by 1 bit. On the left side of the matrix is a row selector, which basically consists of a large demultiplexer. This takes the 8 bit row address and selects one of 256 rows. Along each row are 256 capacitors, with transistors to connect the capacitors to the columns when the row line is asserted. At the top of the matrix is a precharge circuit which charges each column to half the supply voltage; when each capacitor in a row is connected to a column line, it swings the voltage slightly higher or lower than half, depending on the bit stored in the capacitor. Along the bottom of the matrix are 256 sense amplifiers which turn the slightly higher or lower voltage into a definite high or low signal, which is then fed back into each capacitor in a row to fully charge or discharge it again. At that point, the whole row is refreshed.
From there, the 256 columns are multiplexed down to one bit using the column address, and that's the data bit output by the chip. Eight such chips in a row provide an 8-bit data bus and 64K bytes of memory.
Simply put, if the row address on the DRAM is connected to the lower eight bits of the system's address bus, then the system simply needs to periodically read the first 256 bytes of memory to refresh the entire 64K.
For a good reference, "The Apple ][ Circuit Desccription" would be a GREAT guide to work with since the Apple ][ used 48K of dynamic RAM with TTL circuitry providing the refresh. The book itself explains how it all works, so there is a fantastic guide to solving your design there.
"My biggest dream in life? Building black plywood Habitrails"
I can't find a reference for the 4164s right now, but I believe that a hardware refresh circuit would probably work better than software, due to the load it would put on the processor. On the Commodore 64, the video chip took care of it automatically, and so refresh was completely invisible to the processor.