Might be worth linking to the other thread which addresses related questions:
viewtopic.php?p=37133#p37133By coincidence,
here's a recent blog post describing the PPU and linking to an emulator: reading the source code of an emulator might help a bit.
Might be best to start by understanding - if not in fact building - a simpler system before moving up to something as complex as the NES' PPU. A monochrome bitmap is simpler than a monochrome system with character ROM, is simpler than a colour bitmap, is simpler than character-based colour attributes, is simpler than a tile-based system. Sprites are an extra complexity - Apple and Acorn managed without them, and supported impressive games.
Working backwards, you need to choose a video standard (because your monitor will only sync to some standard timings) and that tells you how many pixels per second need to be output during the active part of the scanline. The earliest systems didn't have frame buffers or line buffers: so they need to fetch video data from main memory (or shared memory) at the appropriate rate, and the video system has to have priority.
(In the other thread, there's some difficulty trying to get both the video and the CPU to have enough memory access without impeding one another - simpler is to give the video priority and stall the CPU. A tradeoff of performance for complexity.)
For a monochrome system, your video logic needs to fetch one bit per pixel. In fact, it will actually fetch in bytes, or perhaps in pairs of bytes.
For a colour system, you need to fetch some number of bits per pixel.
A monochrome system with a character ROM has an extra level of indirection: the foreground/background bits must be fetched from the ROM, with the ROM addresses coming from the main memory (or shared memory.)
A colour system with a character ROM has to fetch per-character colour attributes and use those with the foreground/background information to get the actual pixel values.
A system with tile graphics has to fetch a per-tile index and then fetch data from the tile definitions, and then perhaps fetch colour data from an attribute area.
Summary: sprites and tile graphics are convenient for a low-memory system, but not essential even for video game purposes.