6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 10:29 am

All times are UTC




Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next
Author Message
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 08, 2016 4:20 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Code:
+--------------------------------------------------------------------------------------------------------------------------------+
| 6502 CORE                                                                                                                      |
|  +------------+        +------------+        +------------+        +------------+        +------------+        +------------+  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  |  Prefetch  +-------->   Decode   +--------> Effective  +--------> Read Data  +--------> Execution  +--------> Write Back |  |
|  |            |        |            |        |  Address   |        |            |        |            |        |            |  |
|  |            |        |            |        |            |        |            |        | +--------+ |        |            |  |
|  |            |        |            |        |            |        |            |        | |        | |        |            |  |
|  |            |        |            |        |            |        |            |        | |   ALU  | |        |            |  |
|  |            |        |            |        |            |        |            |        | |        | |        |            |  |
|  |            |        |            |        |            |        |            |        | +--------+ |        |            |  |
|  |            |        |            |        |            |        |            |        |            |        |            |  |
|  +-----^-+----+        +------------+        +-----^-+----+        +-----^-+----+        +------------+        +-----^-+----+  |
|        | |                                         | |                   | |                                         | |       |
|        | |                                         | |                   | |                                         | |       |
|        | |                                         | |                   | |                                         | |       |
|        | |                                         | |                   | |                                         | |       |
|  +-----+-v----+        +------------+        +-----+-v-------------------+-v-----------------------------------------+-v----+  |
|  |            |        |            |        |                                                                              |  |
|  |            |        |            |        |                                                                              |  |
|  |            |        |            |        |                                                                              |  |
|  |            +-------->            <--------+                                                                              |  |
|  |  I Cache   |        |     MMU    |        |                                    D Cache                                   |  |
|  |            <--------+            +-------->                                                                              |  |
|  |            |        |            |        |                                                                              |  |
|  |            |        |            |        |                                                                              |  |
|  +------------+        |            |        +------------------------------------------------------------------------------+  |
|                        |            |                                                                                          |
+--------------------------------------------------------------------------------------------------------------------------------+
                         |            |        +------------------------------------------------------------------------------+
                         |            <--------+            |                                                                 |
                         |  WISHBONE  |        |  WISHBONE  |                   Peripherals/Main Memory                       |
                         |            +-------->            |                                                                 |
                         +------------+        +------------+-----------------------------------------------------------------+



This is the microarchitecture simple diagram.
I'll try to find a way to bypass the caches.

Thanks a lot.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 08, 2016 4:22 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
manili:

Welcome. I will also echo the sentiments expressed above: keep your project as simple as possible.
manili wrote:
1. What do you think of adding an instruction to bypass the D-Cache ?
I would not add an instruction for this. It will increase the complexity of your project.
manili wrote:
2. I still didn't understand the problem, you mean some instructions write on part of memory which is location of another instruction? So why is this important to handle because I think this is programmer's fault? But if it's vital to handle this kind of problems, would you like give me an assembly code example?
A example of a self-modifying program would be modifying the absolute address operand of a jmp instruction to dynamically change the target address of the instruction. This code may be used to emulate indirection, computed goto's, switch statements, etc. For example, the following 6502 code snippet emulates the behavior of the 65C02 jmp (abs,X) instruction:
Code:
Offset:     equ $80
JumpTable:  equ FunctionTable
JmpBase_Lo: equ JmpI+1
JmpBase_Hi: equ JmpI+2

JmpAbsXI:   lda Offset      ; load accumulator with Offset located in zero page
            clc
            adc JmpBase_Lo  ; add Offset to JumpTable address in code (lo byte)
            bnc JmpI        ; optional if % 256 address calculations not used
            inc JmpBase_Hi  ; increment hiaddress (optional if % 256 not used)
JmpI:       jmp (JumpTable) ; jump indirect
I am sure that you've been able to determine what instructions are available on the 6502, 65C02, and W65C02S processors. I've constructed an instruction table that provides that illustrates the growth of the instruction sets. You can find a copy of that table on GitHUB here. Instructions in black are 6502 instructions, those in red are 65C02 instructions, and those in blue are W65C02S instructions. If the WAI and STP instructions are removed, then the remaining instructions are the instruction set of the Rockwell R65C02 processors. For your project, I would recommend focusing your efforts on the 6502-only instructions listed in that table. Furthermore, I would recommend leaving the 6502 instructions supporting indirection out of your project: instructions in column 1 and column C. The loss of full compatibility is not that critical, although many 6502/65C02 programs make extensive use of indirection.

Finally, in working on my processor cores to reduce the number of dead memory cycles, I came to the conclusion that the depth of the pipeline that can be effectively implemented without too much complexity is a function of the number of addressable registers. In the case of the 6502/65C02, the majority of the instructions target a single register: the accumulator. Unless your plan is to produce a machine like current x86 machines which are supported with a complex instruction scheduler and large (and hidden) register array (used with register renaming), I would recommend reducing the depth of your pipeline to 3 or less.

If your project can successfully complete Klaus Dormann's 6502 test suite, I would say that your project is a great success. Klaus' test suite has certainly proven itself as extremely useful to me as I've developed my cores.

Good luck with your project. I too will be very interested in following your progress on this forum.

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 08, 2016 6:54 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Here's some more simple examples:
Code:
        ; Updating a branch destination with a 1-byte vector
        lda branch_table,x
        sta branch+1
branch: bne *

Code:
      ; Add .X to .A without taking additional memory
      stx addx + 1
addx: adc #0

Code:
      ; If .A contains $4c, it JMPs past the following code
      ; If it contains $2c, it's a BIT abs, which will continue along
      sta skip
skip: jmp end
      ...code...
end:


I've also overwritten instructions to toggle them between DEX and INX based on what direction I need to copy memory for overlapping regions. Some of these modify code further away that wouldn't necessarily reside in the i-cache, but still might if it was recently executed and there isn't much i-cache pressure due to looping.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 08, 2016 9:17 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
MichaelM wrote:
I am sure that you've been able to determine what instructions are available on the 6502, 65C02, and W65C02S processors. I've constructed an instruction table that provides that illustrates the growth of the instruction sets. You can find a copy of that table on GitHUB here. Instructions in black are 6502 instructions, those in red are 65C02 instructions, and those in blue are W65C02S instructions. If the WAI and STP instructions are removed, then the remaining instructions are the instruction set of the Rockwell R65C02 processors. For your project, I would recommend focusing your efforts on the 6502-only instructions listed in that table. Furthermore, I would recommend leaving the 6502 instructions supporting indirection out of your project: instructions in column 1 and column C. The loss of full compatibility is not that critical, although many 6502/65C02 programs make extensive use of indirection.


Thank you very much for your reply.
- The table was really helpful. I used "MCS6500 Family Programming Manual" to choose my instructions, and I think it is very similar to 6502 ISA than the others. Note that I already implemented Indirect Addressing mode and it is working perfectly.

MichaelM wrote:
Finally, in working on my processor cores to reduce the number of dead memory cycles, I came to the conclusion that the depth of the pipeline that can be effectively implemented without too much complexity is a function of the number of addressable registers. In the case of the 6502/65C02, the majority of the instructions target a single register: the accumulator. Unless your plan is to produce a machine like current x86 machines which are supported with a complex instruction scheduler and large (and hidden) register array (used with register renaming), I would recommend reducing the depth of your pipeline to 3 or less.

If your project can successfully complete Klaus Dormann's 6502 test suite, I would say that your project is a great success. Klaus' test suite has certainly proven itself as extremely useful to me as I've developed my cores.

Good luck with your project. I too will be very interested in following your progress on this forum.


- Well I can combine Effective Address and Read Data into a single stage but the problem is the critical path, imagine the instruction is "Indexed Indirect" so the result would be a disaster! So I think this is the best way to pipeline the processor. Please feel free to share your opinion about this idea.
- thank you very much for sharing the test suite. I'll do my best to pass it, but is it appropriate for pipelined cores? Does it support all kinds of hazards?

Thanks again for your helpful reply.

White Flame wrote:
Here's some more simple examples:
Code:
        ; Updating a branch destination with a 1-byte vector
        lda branch_table,x
        sta branch+1
branch: bne *

Code:
      ; Add .X to .A without taking additional memory
      stx addx + 1
addx: adc #0

Code:
      ; If .A contains $4c, it JMPs past the following code
      ; If it contains $2c, it's a BIT abs, which will continue along
      sta skip
skip: jmp end
      ...code...
end:


I've also overwritten instructions to toggle them between DEX and INX based on what direction I need to copy memory for overlapping regions. Some of these modify code further away that wouldn't necessarily reside in the i-cache, but still might if it was recently executed and there isn't much i-cache pressure due to looping.


Thanks for your reply.
Actually I think there are 3 scenarios to handle this kind of problems :
1. Use an special instruction to tell the processor "DO NOT CACHE FOLLOWING INSTRUCTIONS". I mean there would be a region in the instructions which set a special flag which tell the processor to bypass the cache units.
Code:
BYI ;Bypass I-Cache
//Instructions
BYI ;Use I-Cache again

2. We can add a special instruction to flush the I-Cache/D-Cache completely.
3. We can add a special instruction to flush a specific address of I-Cache/D-Cache.

Any other ideas ?

Thanks a lot.

P.S. : As I said before this is a B.S. project, so I should add these features later.


Last edited by manili on Mon Oct 10, 2016 7:54 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sat Oct 08, 2016 11:29 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
manili wrote:
I mean there would be a region in the instructions which set a special flag which tell the processor to bypass the cache units. [...]

Any other ideas ?

I'm not sure what you mean by "a region in the instructions which set a special flag which tell the processor to bypass the cache units." Do you mean there will be a portion of the 64K address space which is not cached? That's actually crucial, as I'll explain.

Remember the 65xx family has no IN or OUT instructions. Instead, it's normal for a 65xx computer to reserve a portion of its address space for memory-mapped I/O. This portion of the address space must not be cached, ever. Otherwise the intended communication with the outside world won't occur! :lol: If someone already mentioned this I apologize. But it's imperative that your plan allows a way to prevent caching of the portion of the address space which is used for memory-mapped I/O.

Fascinating project -- I will follow with interest! :) Best of luck,

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 2:12 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
BigEd wrote:
(Garth, I think you may have missed that the idea is to run with caches - so the CPU and caches run at some high speed, which is decoupled from the speed of the external memory system, which in turn only sees cache line loads and stores.)

Yes, I did miss the part about the caches. I don't deal with caches, but it seems like the small memory space of the '02 can entirely fit in the cache, meaning it's not really cache but instead just the memory being brought onboard, onto the same chip), and that seems to be the case with the '02 licensee WDC says is running it at over 200MHz. (I don't know anything further about it.) Is there any point in doing a single cache? For keeping the two caches (D and I) and addressing the other issues above, would a dual-port RAM work well?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 3:30 am 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Dr Jefyll wrote:
manili wrote:
I mean there would be a region in the instructions which set a special flag which tell the processor to bypass the cache units. [...]

Any other ideas ?

I'm not sure what you mean by "a region in the instructions which set a special flag which tell the processor to bypass the cache units." Do you mean there will be a portion of the 64K address space which is not cached? That's actually crucial, as I'll explain.

Remember the 65xx family has no IN or OUT instructions. Instead, it's normal for a 65xx computer to reserve a portion of its address space for memory-mapped I/O. This portion of the address space must not be cached, ever. Otherwise the intended communication with the outside world won't occur! :lol: If someone already mentioned this I apologize. But it's imperative that your plan allows a way to prevent caching of the portion of the address space which is used for memory-mapped I/O.

Fascinating project -- I will follow with interest! :) Best of luck,

-- Jeff


Thanks Jeff
Actually my processor cache bypasses I/O mapped area right now. What I was talking about was a way to bypass caches for RAM area of address space. So people can use "BYI" (Bypass I-Cache) or "BYD" (Bypass D-Cache) blocks inside their code to tell the processor start/finish bypassing the cache.
Code:
;Normal progress of the program
BYI
;From here any codes you write will not cache inside the I-Cache
;Some codes
BYI
;End of cache bypassing and now the program continue the normal progress again.
;Please beware of faulty branches (e.g. from inside a BYI block to out side of it).


Thanks a lot for your note Jeff.


Last edited by manili on Sun Oct 09, 2016 11:31 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 3:47 am 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
GARTHWILSON wrote:
BigEd wrote:
(Garth, I think you may have missed that the idea is to run with caches - so the CPU and caches run at some high speed, which is decoupled from the speed of the external memory system, which in turn only sees cache line loads and stores.)

Yes, I did miss the part about the caches. I don't deal with caches, but it seems like the small memory space of the '02 can entirely fit in the cache, meaning it's not really cache but instead just the memory being brought onboard, onto the same chip), and that seems to be the case with the '02 licensee WDC says is running it at over 200MHz. (I don't know anything further about it.) Is there any point in doing a single cache? For keeping the two caches (D and I) and addressing the other issues above, would a dual-port RAM work well?

Garth thanks for your reply.
Well there are some reasons to use caches :
1. First thing first this is a B.S. project and I should have something to represent :mrgreen: :mrgreen: !
2. Each time, pipeline requests different number of op-codes/operands based on number of used op-codes/operands during the previous decode process. Normal memories cannot handle this kind of request.
3. Currently my D-Cache has 4 groups of different ports (Effective Address, Read Data, Write Data, MMU). So I think doing the same with the main memory is not possible.
4. As I said before, my project is WISHBONE compatible. WISHBONE needs at least 3 clock cycles to RD/WR data so it's not a good idea to use memory directly.
5. I'm currently using one MMU, so we can extend it to support memory protection, virtual addressing and it will possible to support a much bigger main memory (I know that each process should stick to only 64KB).

Thanks a lot.


Last edited by manili on Mon Oct 10, 2016 7:55 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 6:15 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
manili wrote:
Any other ideas ?
P.S. : As I said before this is a B.S. project, so I should add these features later.

Very good idea - make sure you do the right level of write-up and note some pending things in Further Work. It will show that you know that there is more to do - and there always is more to do!

As for other ideas:
- you can control the caches using an I/O interface instead of extra instructions
- the memory controller can snoop the caches, spotting that the I-cache is reading data which is currently sitting in the D-cache, or that the D-cache is writing data which is current sitting in the I-cache. If you have write-through caches I think this problem is simpler.

Garth, yes, there are still advantages to caches
- possibly the RAM space is more than 64k
- possibly the resources available for cache is less than 64k
- the caches can provide more than one byte per cycle of bandwidth (and there are several ways to use that)
- the caches can efficiently interface to SDRAM by using burst mode accesses
- the caches can insulate against a RAM system which serves other uses, such as DMA or video, or which needs time out to refresh.

Edit: oops, I see I've skipped a reply already made!


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 2:13 pm 
Offline

Joined: Fri Oct 07, 2016 9:44 pm
Posts: 31
Is it possible/reasonable to ask users/assembles to replace RTI with PLP + RTS ? Because I think it's impossible for the pipeline to handle this without unreasonable overload.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 2:17 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Ouch! In very general terms, for your purposes, making a not-quite 6502 is surely OK, provided you note the differences and the reasons for the differences.

In this specific case, it's even a little worse than you think, because RTS and RTI have slightly different approaches to the contents of the stack. RTI pulls the actual return address, but RTS pulls a value which needs to be incremented before use as the PC.

You could get away with an RTI which takes a bit longer to sort itself out, I would think, because it's never in a loop and you're expecting to clock pretty fast. Perhaps your fetch and decode can make RTI look a bit like PLP and RTS? You still need to handle the PC discrepancy though.

Edit: although now i think of it, I think one of Michael's 6502-like cores does handle RTI and RTS the same. It's worth nothing that some existing code will certainly need adjusting if you do this: both RTI and RTS are sometimes used to perform a jump.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 2:22 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
manili:
manili wrote:
- Well I can combine Effective Address and Read Data into a single stage but the problem is the critical path, imagine the instruction is "Indexed Indirect" so the result would be a disaster! So I think this is the best way to pipeline the processor. Please feel free to share your opinion about this idea.
I think that perhaps you and I are not on the same page with respect to the vocabulary. What is your definition of the "effective address"? I think of the "effective address" as the final address for the operand or next instruction. In other words it is the address after the indirect pointer has been read from memory and after the index register has been added to it. The major problem posed by the post-indexed (by Y) indirect addressing mode is the number of memory accesses that are needed to calculate the "effective address".

Given the complexity of the 6502 addressing modes, I might have broken the pipeline into three stages: (1) fetch/decode; (2) operand fetch/data read; and (3) execute/write-back. For simple instructions, each pipeline stage would be designed to operate using one cycle. For the indirect addressing modes, the operand fetch/data read pipeline stage would stall the pipeline and use as many memory access cycles as required to fetch the required operand. In most cases, given that your prefetch unit/stage has already fetched any direct operands along with the opcode from cache, the operand fetch/data read stage does not need to perform any additional memory cycles to formulate the "effective address" of the operand that will be read.

If you can already handle the pre-indexed (by X) indirect addressing mode, or any of the indexed zero page or indexed absolute addressing modes, I don't see how the "critical path" is so affected by the post-indexed indirect addressing mode. Can you describe the "critical path" as you see it in your implementation?
manili wrote:
- thank you very much for sharing the test suite. I'll do my best to pass it, but is it appropriate for pipelined cores? Does it support all kinds of hazards?
Whether a processor is pipelined or not should have no effect on the execution of valid programs. I think that you'll find that Klaus has used a number of standard 6502 programming techniques that will fully exercise your pipeline. If by pipeline hazards you are referring to data hazards such as the RAW, WAR, and WAW data hazards as defined in Hennessy and Patterson, then I am pretty sure that you will find plenty of those within Klaus' test suite. I also think that his test suite will provide sufficient structural hazards for the memory interface that you described above.

A final comment. To avoid many of the problems associated with self-modifying code, I would use a combined I/D cache with a write-through policy, and a write buffer with an address bypass. This approach should avoid many of the problems associated with the modified Harvard architecture that a dedicated I-only cache provides. It should also reduce the need for cache snooping or similar techniques that may be necessary with separate I/D caches in order to support common 6502/65C02 programming techniques that rely on self-modifying code.

Edit:

I am kind of partial to the 6502/65C02, and I've spent quite a bit of my free time in the past couple of years working on and off on an enhancement to the 6502/65C02. I've resisted attempting to pipeline my 6502/65C02 cores to a greater degree than they already are because of some of the problems that have been discussed above.

I would suggest that the Intel i8080 would be a much better target for pipelining. It has far fewer addressing modes (absolute, PC-relative, register, register indirect, and implicit), includes register-accumulator operations, and does not support indirection through memory like the 6502/65C02. It would less challenging to pipeline the 8080, but the learning experience would still be substantial. Being an accumulator based ALU, converting your 6502 core into an 8080 core should be fairly straightforward. (Please consider this recommendation carefully before jumping into another briar patch.)

I recall a project in one of my undergraduate classes (a survey of microprocessors) to construct a processor card that could be single stepped. Many of my class mates selected 6800/6502 processors to use for the project while the rest (me included) selected the 8080A. All of those using the 6800/6502 processor failed to meet the objective of the assignment because of the dynamic nature of the internal registers of those processors. They needed to implement external circuitry and a subroutine to emulate the single step behavior desired that those using the 8080A could simply implement by stopping the clock. The moral of the story is to select the approach that matches the objective of the project with the least amount of effort.

_________________
Michael A.


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Sun Oct 09, 2016 6:42 pm 
Offline

Joined: Tue Nov 10, 2015 5:46 am
Posts: 230
Location: Kent, UK
Hi manili,

Welcome. Your project sounds like a lot of fun.

As this is a project for your B.S., I would think it's fine to not be 100% compatible with the 6502 programming model, especially with regard to self-modifying code, but also w.r.t. RTI vs. PLP+RTS. The important thing is to document these in your project report, explain why you implemented things the way you did, and make it clear that you understand how your CPUs programming model differs from the reference. You don't want your professor/course leader to think you've missed important architectural concepts... rather you want to covey that you fully understand the consequences of your choices and you defend them.

If the goal of the project is to be 100% compatible with the 6502 programmer's model, then you have some work to do :wink:

As others have pointed out, self-modifying code is a well established programming pattern for the 6502. It can affect both opcode and operands bytes, and so if you do choose to recognize this in, say, your Execute stage then the action should be to flush the pipeline. Dropping the split-cache modified Harvard architecture for a single-cache Von Neumann architecture would simplify things w.r.t. writing bytes to memory and then jumping to it (as might happen when loading a program from external storage).

If you ignore self-modifying code and implement a combined cache then you should be in pretty good shape for a very usable subset of programs. If the cache successfully decouples the execution pipeline speed from the external memory speed, thus reducing memory pressure and providing more external cycles for I/O then it has served it purpose. If the pipeline runs at a higher frequency than external memory, and it's not too long so as to heavily penalize mis-taken branches (which need to flush the pipeline), then it too has served its purpose.

As others have mentioned, you can likely fit the full 64K RAM inside an FPGA, with a small window for, say, I/O and flash. Given that, in order to demonstrate a more functional cache it might be worthwhile supporting MB of external memory, configured via I/O mapped control registers. That way your cache is actually operating as a cache rather than as a memory. Besides MMU configuration this would also require cache manipulation operations (invalidate, writeback+invalidate) - again possibly implemented via I/O registers.

I'll close by saying that if you're 70-80% implemented and already into verification then you may not want to make large architectural changes. In that case I would reiterate taking time to write a good project report. If your Professor is anything like mine then he won't load the design into an FPGA and play with it, and he won't even look at the RTL - nobody has time for that when grading a whole class worth of projects. Rather, he'll want to read a well structured, well written report that demonstrates that you had goals, understood them, implemented something, learned new things along the way, understand what you built and how it differed (if at all) from the original goal, recognize where (if at all) you went wrong, know how you would do things better next time and have ideas for where things can go next. It's not even important that your project work at all... as long as you demonstrate clear thinking and an understanding of the engineering principles.

Good luck!


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 4:58 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
manili wrote:
Garth thanks for your replay.
Well there are some reasons to use caches :
1. First thing first this is a B.S. project and I should have something to represent :mrgreen: :mrgreen: !

Quite valid.

Allow me to play "devil's advocate" though, so your answers will educate those of us who don't know caches.

Quote:
2. Each time, pipeline requests different number of op-codes/operands based on number of used op-codes/operands during the previous decode process. Normal memories cannot handle this kind of request.

Can it do more than one byte per memory cycle (or edge, as mentioned later) per memory port? It seems like with multi-port memory and the MMU (to translate virtual to physical addresses) onboard, a cache would provide no benefit. Or is it that the MMU is not fast enough?

Quote:
3. Currently my D-Cache has 4 groups of different ports (Effective Address, Read Data, Write Data, MMU). So I think doing the same with the main memory is not possible.
4. As I said before, my project is WISHBONE compatible. WISHBONE needs at least 3 clock cycles to RD/WR data so it's not a good idea to use memory directly.

Samuel Falvo, forum name kc5tja, who does not check in very often, is a professional programmer. He is a strong advocate of Wishbone. But is there any reason to put a small memory out on Wishbone instead of onboard? After all, we're not talking about more than a couple of megabytes, are we?

Quote:
5. I'm currently using one MMU, so we can extend it to support memory protection, virtual addressing and it will possible to support a much bigger main memory (I know that each process should stick to only 64KB).

The 6502 was originally designed for embedded control, not desktop computers, and even today mostly goes into embedded-control applications, many of them being realtime, meaning that a program must have immediate, instant access to I/O. Make sure your memory protection does not prohibit this. Going through OS calls is absolutely not acceptable for this kind of application.

Quote:
Is it possible/reasonable to ask users/assembles to replace RTI with PLP + RTS ? Because I think it's impossible for the pipeline to handle this without unreasonable overload.

In addition to what Ed said, note that if there are multiple interrupts pending, doing PLP before RTS (or even RTI) will allow re-starting the ISR, possibly multiple times without unwinding its way out, using more stack space. Granted, going more than a few levels deep means there's simply not enough processing speed to handle the intended load.

BigEd wrote:
Garth, yes, there are still advantages to caches
- possibly the RAM space is more than 64k

That's partly what the '816 is for, and it also has the VMA and VDA outputs for cache use.

Quote:
- the caches can efficiently interface to SDRAM by using burst mode accesses
- the caches can insulate against a RAM system which serves other uses, such as DMA or video, or which needs time out to refresh.

How does that differ from dual-port RAM?

BTW, "replay" means to "play again." The word you want is "reply," with no "a," meaning "to answer."

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Pipelined 6502
PostPosted: Mon Oct 10, 2016 5:34 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
(I feel I must respond to a couple of your points, even though your comments are addressed to manili)

(Hmm, an affordable FPGA has only 64k RAM on board - are you thinking of a different meaning of on board here Garth? The essential thing about memory hierarchies, where you have one or more layers of cache close to the CPU, and RAM somewhat further away, is that fast affordable memories are small, large affordable memories are slow. If money is no object - rarely true in engineering - you can just put a huge fast multiport memory next to the CPU, but normally you can't.)

(As for the idea of fast predictable response for embedded computing, yes and no. A CPU running at 100MHz or 200MHz may well have the freedom to take a few more cycles, or a variable number of cycles, to respond to interrupts, and still be a great improvement on a 1MHz or 14MHz CPU which behaves exactly like a 6502. Or it may not - it depends on what the requirements are for any specific use case. A very fast and very cycle-efficient and entirely deterministic CPU is a difficult target to hit - again, there are engineering tradeoffs. What made sense at 1MHz in the early 70s may need adjusting forty years later.)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 92 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: