Some general comments about pros and cons, without reference to the 6502. Hope it helps.
Price, and performance - those are the drivers for microarchitecture changes. Oh, and power consumption, but let's leave that for now.
Price, for chips, is a function of area - an exponential function. And it's also, famously, a function of time, also exponential. So, the 386 came out in 1986 and comprises 275k transistors, whereas the pipelined and cache-containing 486 came out in 1989 and comprised 1200k transistors. That's more than 4x the number of transistors, which can surely make a huge difference, but being 3 years later the cost to manufacture might be about the same. That's amazing, and it's what's driven the progress in performance.
Performance, when comparing implementations of the same instruction set, is the product of two factors: clock frequency, and instructions per clock. Pipelining helps clock frequency, by doing less work between ticks. (Other transistor-expensive tactics improve clock frequency too - faster adders, more adders, or more generally faster logic and more logic.) Caches help instructions per clock, by improving effective memory bandwidth. Other expensive tactics help instructions per clock too: fancier decoders, branch predictors.
So, for an exercise in making a new microarchitecture for an existing instruction set, like this one, you can see directly how difficult it has been - that's a measure of the engineering cost, or the capital cost, not the production cost - but it's possible you can't yet see the two other measures. You can't see instructions per clock until you've run some simulations, you can't see clock speed until you've run through synthesis, and you can't see the production cost until, again, you've run through synthesis.
In principle you can compare your implementation, after synthesis, for clock speed and gate count, with another implementation such as the large T65 or the small Arlet core. Your core, with the caches, will run with proportionally slower memory than a cacheless core, so you could also compare two (or three) implementations on the assumption that the core speed is constrained only by memory speed. But to show the performance benefit of pipelining on instructions per clock, I think you'll need some simulations.
(BTW, often you will see CPI, or clocks per instruction, rather than instructions per clock. Same thing, but upside down.)
All the above, then, is quantitative. I'm sure it would be good for you also (or instead) to make some qualitative comparisons, which is what White Flame suggests. Probably comparing 6502 with RISC would be useful. The 6502 has variable length instructions, has very few architectural registers and only narrow ones, has complex addressing modes which have multiple memory references. All of those, and probably more, have led to it being difficult to construct an improved microarchitecture.
Intel had the same predicament as you did, when designing the 486. The Wikipedia page isn't at all bad, and hopefully links to some authoritative and technical sources - you wouldn't want to cite Wikipedia! It might be that now, after this project, you can make a good parallel argument and comparison with the 386 to 486 microarchitectures. It wouldn't hurt at all to show your understanding, in my opinion! (I would first break down the equation 486 = 386 + cache + FPU, to estimate how many transistors, or how much area, was spent on the microarchitectural upgrade. You can do the same with your 6502.)
|