6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 7:53 pm

All times are UTC




Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 9:06 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
If you have to load 1K at a time, at 100MB/s (for the sake of discussion), that's 10µs, which is a long, long time to wait for interrupt service on a processor designed for performance high enough to justify having caches.

The data I referred to show that this particular processor takes 8ns to fill a cache line from external memory, which is not a long, long time, even assume the CPU is actually stalled, which is most likely not the case.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 9:31 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
manili wrote:
Thank you all for this hot discussion :mrgreen: !

I hope we've been able to help! You should feel free to continue here or to start new threads if you have any particular questions. Many of us use this site not just for discussion, but also as a repository of wisdom, so it's good to be able to find things later - even years later.

One thing you did say earlier, about Klaus' testsuite - indeed, it's only intended to test 6502 behaviour, and only some of it, but most implementations or emulations have one or two bugs and this suite is a good one to pass. It will exercise some of your pipelined architecture, but probably not every possibility. One reason to make a machine which is not super-complicated is the difficulty of verifying it. Even Intel don't put every possible smart idea into each revision - they move forward a step at a time.

Good luck, anyway, and hopefully we will see more interesting news as you progress.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 9:33 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
Since a problem when implementing an instruction cache would be self_modifying code,
I would suggest to take a look at how the MC68030 data cache works:

"If a cache hit occurs on a write cycle, both the data cache and the external device are updated with the new data.
If a write cycle generates a cache miss, the external device is updated, and a new data cache entry can be
replaced or allocated for that address..."

Would something like that be helpful for building a 6502 instruction cache ?

68030 manual, Page 13.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 9:49 am 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
BigEd wrote:
manili wrote:
Thank you all for this hot discussion :mrgreen: !

I hope we've been able to help! You should feel free to continue here or to start new threads if you have any particular questions. Many of us use this site not just for discussion, but also as a repository of wisdom, so it's good to be able to find things later - even years later.

One thing you did say earlier, about Klaus' testsuite - indeed, it's only intended to test 6502 behaviour, and only some of it, but most implementations or emulations have one or two bugs and this suite is a good one to pass. It will exercise some of your pipelined architecture, but probably not every possibility. One reason to make a machine which is not super-complicated is the difficulty of verifying it. Even Intel don't put every possible smart idea into each revision - they move forward a step at a time.

Good luck, anyway, and hopefully we will see more interesting news as you progress.


I will, Thank you.

You mean the test suite does not test every possible states (I know thats impossible ;) ) ? So how much reliable will my processor be after passing this test suite ?

ttlworks wrote:
Since a problem when implementing an instruction cache would be self_modifying code,
I would suggest to take a look at how the MC68030 data cache works:

"If a cache hit occurs on a write cycle, both the data cache and the external device are updated with the new data.
If a write cycle generates a cache miss, the external device is updated, and a new data cache entry can be
replaced or allocated for that address..."

Would something like that be helpful for building a 6502 instruction cache ?

68030 manual, Page 13.


Currently I'm using Write-Back policy, so you mean I should change it to Write-Through ?
How do you think about my bypassing idea http://forum.6502.org/viewtopic.php?f=4&t=4270&start=15#p47851


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 10:45 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
manili wrote:
How do you think about my bypassing idea

I now have to admit that I'm from the "TTL nerd corner", what means building CPUs from individual TTL chips... or transistors.
In my case, memory always was faster than the CPU, so I'm not too deep into implementing caches. :)

What about "having a lookup table somewhere" which tells the CPU what part of the memory should be cached
and what part should not ? // 1KB "page granularity" would take 8 Bytes of lookup table for 64kB of memory.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 11:20 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
manili wrote:
BigEd wrote:
...about Klaus' testsuite - indeed, it's only intended to test 6502 behaviour, and only some of it...

You mean the test suite does not test every possible states (I know thats impossible ;) ) ? So how much reliable will my processor be after passing this test suite?

Indeed, complete coverage is almost impossible, and there's something of a challenge in even measuring coverage. I couldn't guess! I see Michael has expressed confidence.

One thing I'll note: Arlet's first version of his core lacked RDY, and at some point it passed Klaus' suite. Later, Arlet added RDY, and a little later ran the suite again, with a randomly-toggling RDY. I think I remember that doing so did show up one bug. Much much later we found another RDY-related bug. So, even something as conceptually simple as stalling a basic 6502 is not completely or easily covered by this one test suite.

(Another interesting datapoint: when the visual6502 was first used to model the boot of a C64 to the Basic prompt, Michael Steil investigated how many transistors could be deleted without breaking the boot. It turned out to be about a third. So, getting good coverage is difficult!)

There's a whole career to be had, specialising in CPU verification. CPU teams are often half staffed with designers and half with verifiers.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 5:32 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Per Ed's suggestion, the suitability of caches for the 65xx is taken up in a new topic, at viewtopic.php?f=4&t=4271 .

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Fri Oct 14, 2016 11:26 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Hi all,
Good news first :
Some people kindly told me to inform the forum with my progress. Currently I'm working hard to implement stack related instructions which are the most challenging instructions to implement. My processor, now accepts PHA, PHP, PLA, PLP and the most important one, RTS. After a lot of changing (believe me when I say a lot !) finally I found a way to handle instructions like RTI.
Here are my successfully passed test cases :
Code:
      //RAM

      memory[16'h0000] <= 8'h00;
      memory[16'h0001] <= 8'h00;
      memory[16'h0002] <= 8'h00;
      
      memory[16'h001A] <= 8'h20;
      memory[16'h001B] <= 8'h10;

      memory[16'h00FF] <= 8'h00;

      memory[16'h0020] <= 8'h0F;
      memory[16'h0021] <= 8'h10;
      
      memory[16'h100D] <= 8'h06;
      memory[16'h100E] <= 8'h80;
      memory[16'h100F] <= 8'h0C;

      //ROM
      
      /* Jump + Indirect Addressing Tests */

      memory[`P_C_START + 00] <= `JMP_I;
      memory[`P_C_START + 01] <= 8'h0D;
      memory[`P_C_START + 02] <= 8'h10;
      
      memory[`P_C_START + 06] <= `LDA_IME;
      memory[`P_C_START + 07] <= 8'hFF;
      memory[`P_C_START + 08] <= `LDA_IME;
      memory[`P_C_START + 09] <= 8'h0F;
      
      memory[`P_C_START + 00] <= `LDA_IME;
      memory[`P_C_START + 01] <= 8'h20;
      memory[`P_C_START + 02] <= `LDY_IME;
      memory[`P_C_START + 03] <= 8'hFF;
      memory[`P_C_START + 04] <= `STA_I_Y;
      memory[`P_C_START + 05] <= 8'h00;
      memory[`P_C_START + 06] <= `LDX_ABS;
      memory[`P_C_START + 07] <= 8'hFF;
      memory[`P_C_START + 08] <= 8'h00;
      memory[`P_C_START + 09] <= `JMP_ABS;
      memory[`P_C_START + 10] <= 8'h11;
      memory[`P_C_START + 11] <= 8'h80;
      memory[`P_C_START + 12] <= `ADC_I_X;
      memory[`P_C_START + 13] <= 8'h00;
      memory[`P_C_START + 14] <= `JMP_ABS;
      memory[`P_C_START + 15] <= 8'h1A;
      memory[`P_C_START + 16] <= 8'h80;
      memory[`P_C_START + 17] <= `LDA_IME;
      memory[`P_C_START + 18] <= 8'h01;
      memory[`P_C_START + 19] <= `LDA_IME;
      memory[`P_C_START + 20] <= 8'h02;
      memory[`P_C_START + 21] <= `LDA_IME;
      memory[`P_C_START + 22] <= 8'h03;
      memory[`P_C_START + 23] <= `JMP_ABS;
      memory[`P_C_START + 24] <= 8'h0C;
      memory[`P_C_START + 25] <= 8'h80;

      /* Flags Setting/Clearing Tests */

      memory[`P_C_START + 00] <= `JMP_ABS;
      memory[`P_C_START + 01] <= 8'd26;
      memory[`P_C_START + 02] <= 8'h80;

      memory[`P_C_START + 26] <= `SED;
      memory[`P_C_START + 27] <= `SEC;
      memory[`P_C_START + 28] <= `CLI;
      memory[`P_C_START + 29] <= `CLC;

      /* Loop Test */

      memory[`P_C_START + 00] <= `JMP_ABS;
      memory[`P_C_START + 01] <= 8'd30;
      memory[`P_C_START + 02] <= 8'h80;

      memory[`P_C_START + 30] <= `LDA_ABS;
      memory[`P_C_START + 31] <= 8'h02;
      memory[`P_C_START + 32] <= 8'h00;
      memory[`P_C_START + 33] <= `ADC_IME;
      memory[`P_C_START + 34] <= 8'h01;
      memory[`P_C_START + 35] <= `STA_ABS;
      memory[`P_C_START + 36] <= 8'h02;
      memory[`P_C_START + 37] <= 8'h00;
      memory[`P_C_START + 38] <= `EOR_IME;
      memory[`P_C_START + 39] <= 8'h0F;
      memory[`P_C_START + 40] <= `BNE;
      memory[`P_C_START + 41] <= 8'hF6;
      
      /* Transition Tests */
      
      memory[`P_C_START + 00] <= `JMP_ABS;
      memory[`P_C_START + 01] <= 8'd42;
      memory[`P_C_START + 02] <= 8'h80;
      
      memory[`P_C_START + 42] <= `TXA;
      memory[`P_C_START + 43] <= `TYA;
      memory[`P_C_START + 44] <= `TAX;
      memory[`P_C_START + 45] <= `DEY;

      /* Fibonacci Test */

      memory[`P_C_START + 00] <= `JMP_ABS;
      memory[`P_C_START + 01] <= 8'd46;
      memory[`P_C_START + 02] <= 8'h80;

      memory[`P_C_START + 46] <= `LDY_IME;
      memory[`P_C_START + 47] <= 8'h07;
      memory[`P_C_START + 48] <= `LDA_IME;
      memory[`P_C_START + 49] <= 8'h00;
      memory[`P_C_START + 50] <= `STA_ABS;
      memory[`P_C_START + 51] <= 8'h03;
      memory[`P_C_START + 52] <= 8'h00;
      memory[`P_C_START + 53] <= `LDA_IME;
      memory[`P_C_START + 54] <= 8'h01;
      memory[`P_C_START + 55] <= `TAX;
      memory[`P_C_START + 56] <= `ADC_ABS;
      memory[`P_C_START + 57] <= 8'h03;
      memory[`P_C_START + 58] <= 8'h00;
      memory[`P_C_START + 59] <= `STX_ABS;
      memory[`P_C_START + 60] <= 8'h03;
      memory[`P_C_START + 61] <= 8'h00;
      memory[`P_C_START + 62] <= `DEY;
      memory[`P_C_START + 63] <= `BNE;
      memory[`P_C_START + 64] <= 8'hF8;

      /* Simple Stack Tests */
      
      memory[`P_C_START + 00] <= `JMP_ABS;
      memory[`P_C_START + 01] <= 8'd65;
      memory[`P_C_START + 02] <= 8'h80;

      memory[`P_C_START + 65] <= `TXS;
      memory[`P_C_START + 66] <= `LDX_IME;
      memory[`P_C_START + 67] <= 8'hAA;
      memory[`P_C_START + 68] <= `TXS;
      memory[`P_C_START + 69] <= `TSX;
      
      memory[`P_C_START + 70] <= `PHA;
      memory[`P_C_START + 71] <= `LDA_IME;
      memory[`P_C_START + 72] <= 8'h0F;
      memory[`P_C_START + 73] <= `PHP;
      memory[`P_C_START + 74] <= `PLA;
      memory[`P_C_START + 75] <= `PLP;

      /* Return Oriented Instruction Test */

      memory[`P_C_START + 00] <= `JMP_ABS;
      memory[`P_C_START + 01] <= 8'd76;
      memory[`P_C_START + 02] <= 8'h80;
      
      memory[`P_C_START + 76] <= `LDA_IME;
      memory[`P_C_START + 77] <= 8'h80;
      memory[`P_C_START + 78] <= `PHA;
      memory[`P_C_START + 79] <= `LDA_IME;
      memory[`P_C_START + 80] <= 8'h56;
      memory[`P_C_START + 81] <= `PHA;
      memory[`P_C_START + 82] <= `RTS;

      memory[`P_C_START + 86] <= `LDA_IME;
      memory[`P_C_START + 87] <= 8'hFF;
      memory[`P_C_START + 88] <= `LDA_IME;
      memory[`P_C_START + 89] <= 8'h0F;


Bad news :
The more I implement this guy, the more I become hopeless. When I started the project, I thought this is going to be a useful project but now I think I'm going to ruin the 6502 performance with this kind of pipeline :( !!! Useless ...


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 15, 2016 1:22 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
manili:

There are some lessons to be learned from your project. Don't be too discouraged if pipelining cannot be generally applied.

We know that CISC machine can be successfully pipelined, but it generally requires a substantial transformation of the underlying microarchitecture. AMD/Intel have successfully pipelined the x86 instruction set architecture, but even they never attempted to pipeline the 8086/80186/80286/80386 processors. Only when the payoff was sufficient, did these companies apply significant resources and talent to adding pipeline stages to the 80486 generation of x86 processors.

Finish your project, and use the knowledge learned to guide your future studies and efforts in this area. I can say that some of my best lessons were those for projects for which I felt the end result was not good, i.e. failures as you've described. It's those lessons that keep you from making bold statements about cost/schedule.

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 15, 2016 8:03 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Title your paper "Pros and cons of pipelining in microcontroller environments" or something and discuss the tradeoffs. Your project doesn't (shouldn't?) hinge on speedups, especially if you can explore & show how pipeline design decisions interacted with the 6502.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 15, 2016 10:33 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
MichaelM wrote:
We know that CISC machine can be successfully pipelined, but it generally requires a substantial transformation of the underlying microarchitecture. AMD/Intel have successfully pipelined the x86 instruction set architecture, but even they never attempted to pipeline the 8086/80186/80286/80386 processors. Only when the payoff was sufficient, did these companies apply significant resources and talent to adding pipeline stages to the 80486 generation of x86 processors.


Completely true. This is one of the lessons I've learned from this project. Thanks a lot for all of your advices.

White Flame wrote:
Title your paper "Pros and cons of pipelining in microcontroller environments" or something and discuss the tradeoffs. Your project doesn't (shouldn't?) hinge on speedups, especially if you can explore & show how pipeline design decisions interacted with the 6502.


Do you have any recommendation for comparison ? I mean which items should I compare (e.g. speedup) and what is a good project to compare with ? Thanks.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 16, 2016 6:07 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Some general comments about pros and cons, without reference to the 6502. Hope it helps.

Price, and performance - those are the drivers for microarchitecture changes. Oh, and power consumption, but let's leave that for now.

Price, for chips, is a function of area - an exponential function. And it's also, famously, a function of time, also exponential. So, the 386 came out in 1986 and comprises 275k transistors, whereas the pipelined and cache-containing 486 came out in 1989 and comprised 1200k transistors. That's more than 4x the number of transistors, which can surely make a huge difference, but being 3 years later the cost to manufacture might be about the same. That's amazing, and it's what's driven the progress in performance.

Performance, when comparing implementations of the same instruction set, is the product of two factors: clock frequency, and instructions per clock. Pipelining helps clock frequency, by doing less work between ticks. (Other transistor-expensive tactics improve clock frequency too - faster adders, more adders, or more generally faster logic and more logic.) Caches help instructions per clock, by improving effective memory bandwidth. Other expensive tactics help instructions per clock too: fancier decoders, branch predictors.

So, for an exercise in making a new microarchitecture for an existing instruction set, like this one, you can see directly how difficult it has been - that's a measure of the engineering cost, or the capital cost, not the production cost - but it's possible you can't yet see the two other measures. You can't see instructions per clock until you've run some simulations, you can't see clock speed until you've run through synthesis, and you can't see the production cost until, again, you've run through synthesis.

In principle you can compare your implementation, after synthesis, for clock speed and gate count, with another implementation such as the large T65 or the small Arlet core. Your core, with the caches, will run with proportionally slower memory than a cacheless core, so you could also compare two (or three) implementations on the assumption that the core speed is constrained only by memory speed. But to show the performance benefit of pipelining on instructions per clock, I think you'll need some simulations.

(BTW, often you will see CPI, or clocks per instruction, rather than instructions per clock. Same thing, but upside down.)

All the above, then, is quantitative. I'm sure it would be good for you also (or instead) to make some qualitative comparisons, which is what White Flame suggests. Probably comparing 6502 with RISC would be useful. The 6502 has variable length instructions, has very few architectural registers and only narrow ones, has complex addressing modes which have multiple memory references. All of those, and probably more, have led to it being difficult to construct an improved microarchitecture.

Intel had the same predicament as you did, when designing the 486. The Wikipedia page isn't at all bad, and hopefully links to some authoritative and technical sources - you wouldn't want to cite Wikipedia! It might be that now, after this project, you can make a good parallel argument and comparison with the 386 to 486 microarchitectures. It wouldn't hurt at all to show your understanding, in my opinion! (I would first break down the equation 486 = 386 + cache + FPU, to estimate how many transistors, or how much area, was spent on the microarchitectural upgrade. You can do the same with your 6502.)


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 17, 2016 1:21 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Hi all,
I have done the BRK and RTI instructions. And now I can say implementing other instructions, is somewhat straightforward and I think it's going to be finished until to night. My processor has still no way to handle external interrupts.
Two questions :
- After BRK, should I push B and I flags on to the stack, because they are both 1 ?
- After RTI, should I pop B and I flags off the stack, because maybe they are turned off during interrupt handling ?

If the answer is yes for both questions, good for me because this is current state of my processor :mrgreen: .

Thanks a lot.


Last edited by manili on Mon Oct 17, 2016 1:25 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 17, 2016 1:25 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
BigEd wrote:
In principle you can compare your implementation, after synthesis, for clock speed and gate count, with another implementation such as the large T65 or the small Arlet core. Your core, with the caches, will run with proportionally slower memory than a cacheless core, so you could also compare two (or three) implementations on the assumption that the core speed is constrained only by memory speed. But to show the performance benefit of pipelining on instructions per clock, I think you'll need some simulations.


Thanks a lot Ed.
The list was a complete one. Would you mind give a link to the source of most popular open-core 6502 ?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 17, 2016 2:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
There's a list of cores at http://6502.org/homebuilt#HDL - I couldn't say which are most popular, you might need to find a way to research that. But I think T65 and the Syntiak core (Wendrich/Daly core) are well-regarded, and Arlet's core is popular here at least.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 28 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron