6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 9:01 pm

All times are UTC




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Thu Aug 20, 2015 5:29 pm 
Offline

Joined: Fri Jul 10, 2015 9:53 pm
Posts: 8
Disclaimer: If this post is in the wrong category, please move it! I'm new. idk wat im doin ¯\_(ツ)_/¯

Hello! I'm Ben, a 16 year old student interested in a career in computer engineering. I've recently started a project in designing and simulating a processor, and wanted to make sure I had a strong foundation before moving too much further. I'm designing a pipelined CPU that is heavily inspired from the 6502. It differs from the 6502 in that it has fewer addressing modes and only for certain instructions, the instructions are fixed length (16 bits), all instructions take one cycle, it lacks index registers, smaller address space, has branch predication (think ARM conditional execution), and a few added instructions.

I think the instruction set that I've put together is quite reasonable. The instructions are broken up into three fields: conditional execution field (3 bits), opcode (5 bits), address/immediate (8 bits). Since the implied addressing mode takes no arguments, the lower byte (where the address or immediate would usually be) can be used as an 8 bit opcode. I find programming it very easy (although I'm biased having designed the thing :P) and has a good variety of opcodes. Shifting multiplication is only 4 lines! If you think this project is the worst thing you've ever seen, please tell me! I love to learn, and there's no point in holding back informative, educational criticism.

Questions for the community:
Have I cut to much? Am I going to see a huge loss in performance on general programs due to the lack of some opcodes and addressing modes? Is there any essential functionality (opcodes/addr modes etc.) that I should add? Are there things that the 6502 lacked or things that should've been done differently? For example in my processor, CMP updates the overflow flag unlike the 6502.

The address space gives me 256 bytes of RAM, and 512 bytes of PROM. I see this fit for myself and the programs that I'll be running. I thought that if I really required more space, I could employ bank switching. Is this silly? Should I get rid of the branch predication field, have a 6 bit opcode, and put the other two bits towards the address to give me 1024 bytes?

I'm very excited about this project! I will be designing it in Logisim first, and if I'm feeling ambitious enough, I'd love to built it out of ICs.

These are the different values for the branch predication field. Every instruction in the ISA is conditional, allowing me to reduce the number of branches and therefore branch mispredicts in the pipeline.
Condition Codes
000 tru true
001 otr overflow true
010 ctr carry true
011 cfl carry false
100 ztr zero true
101 zfl zero false
110 ntr negative true
111 nfl negative false

The opcodes here are divided by their addressing mode. Majority of opcodes function identically to their 6502 counterparts. Did the 6502 do them in a way that could be improved upon? If you need me to elaborate on how I plan to implement any of them, please ask!
There's plenty of room in the implied addressing mode for instructions that don't take any arguments (or maybe 4 bit ones?).

IMPLIED
00000 nop no operation
00001 brk break
00010 wai wait for interrupt
00011 clv clear overflow
00100 sec set carry
00101 clc clear carry
00110 sei set interrupt
00111 cli clear interrupt
01000 lsl logical shift left
01001 lsr logical shift right
01010 rol rotate left
01011 ror rotate right
01100 pha push accumulator
01101 pla pull accumulator
01110 php push flags
01111 plp pull flags
10000 rti return from interrupt
10001 rts return from subroutine
10010 tas transfer accumulator to stack
10011 tsa transfer stack to accumulator

IMMEDIATE
00001 add add
00010 adc add with carry
00011 and bitwise AND
00100 eor bitwise exceptional OR
00101 ior bitwise inclusive OR
00110 cmp compare
00111 cpc compare with carry
01000 lda load accumulator

ABSOLUTE
01001 add add
01010 adc add with carry
01011 and bitwise AND
01100 eor bitwise exceptional OR
01101 ior bitwise inclusive OR
01110 cmp compare
01111 cpc compare with carry
10000 sub subtract
10001 sbc subtract with carry
10010 inc increment
10011 dec decrement
10100 lsl logical shift left
10101 lsr logical shift right
10110 rol rotate left
10111 ror rotate right
11000 jmp jump relative
11001 jmp jump indirect
11010 jsr jump subroutine
11011 xch exchange accumulator with memory address
11100 lda load accumulator
11101 lda load accumulator indirect
11110 sta store accumulator
11111 sta store accumulator indirect

Thank you so much for reading this. Means a lot to me to have a community like this to share ideas with and learn from. Take care!
Ben


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 20, 2015 5:46 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
First off, as you probably know, you've most likely got everything you need, in a strict sense, because even a one instruction CPU can be theoretically adequate. So the question is, how practical is your machine - practical to implement, practical to write for, to debug, to pipeline and to run at high clock speed.

I think this kind of thing is full of tradeoffs - from how wide your word is, whether instruction lengths are fixed, how to subdivide opcode space in a sensible way. So, you've made a number of tradeoffs, and for sure some will be advantageous and others the opposite. All instruction sets will have done the same, but of course with different choices.

I've a feeling predication is now felt to be a bad idea - but that might only be in the context of the very high performance expectations we now have. In other words, for a modest machine it might still be a good idea. But that's a very small address space you have there! You'll be able to write simple programs but even say a calculator might run into space trouble, and banking makes things harder to write and harder to debug. So, at a guess, you'd be better off with more address bits and dropping the predication.

I'd suggest you write some sample code - multiplication perhaps, or a simple parser. See what kinds of things are really inconvenient. I suspect the lack of a register to count with will stand out. Also, if you write a few string-handling routines you'll probably see the need for indexing. But these are guesses!

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 20, 2015 10:47 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Nice to hear about your project, Ben! :)

lookaside wrote:
Majority of opcodes function identically to their 6502 counterparts.
So arithmetic and logical operations rely on the accumulator for one source operand and memory for the other, is that right? You haven't explicitly listed the register set, but apparently you have A and a Flags register, and presumably also a PC ...

Predicated instructions are somewhat unconventional, which to some extent is reason enough to endorse them. The age-old practice of having every yes/no decision render its result by means of a two-way branch is so widespread as to be almost mind-numbing, IMO. Predication will help keep your thinking fresh -- and the fast multiply you mention is an example of what it can do for performance! (Still, if predication, or any other feature, becomes too much of a burden in terms of associated tradeoffs, don't be afraid to drop it from the design.)

Projects like yours give rise to so many related discussions there aren't enough hours in the day to deal with them all. But allow me to refer you to some documents I think you'll find valuable. The first is a wonderfully complete and highly readable roadmap of the decisions you face as you flesh out your new architecture, Creating Embedded Microcontrollers by Ken Chapman of Xilinx. And here are a handful of pdf documents posted by Bruce Jacob of the University of Maryland. I've not been through them all but again the readability is high, despite the fact some of the topic material is non-trivial (eg: out-of-order execution).

I'll throw in two more ideas before I sign off. Firstly, you can shrink the size of your opcodes (and use the bits you save for other things) by defining one or more prefix instructions that cause the following opcode to have an alternative meaning. IOW have an alternative opcode map(s). But of course you'll want to try to keep frequently-used instructions in the main opcode map.

Secondly, CPUs with limited resources are better-than-average candidates for the use of self-modifying code. Widely considered heresy, this superstitiously reviled practice can drastically increase the power of a limited instruction set. So don't view it as a religious debate; what it is is simply another tradeoff for you to evaluate. In the simplest case you'll just modify operands, not opcodes, so the technique needn't involve immense complexity. The main downsides are loss of reentrancy and the need for careful documentation.

cheers, and keep us posted,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 20, 2015 11:29 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
The other processor I have lots of experience with, having brought many products to market with it, is the PIC16 (not the 16F1's which came along later with some enhancements, let alone the PIC18 which tried to add processing power by kludging add-ons onto a base that was only suitable for teensy jobs). When I got started in it in the mid-1990's, Microchip was bragging about it efficiency and performance; but see my post on how poorly it does compared to the 65c02.

That's not to say you shouldn't do your own simple processor design. I say go for it. I only offer the above for some perspective on what to expect from the PIC kind of approach. I myself would even like to emulate our 65Org32 with a microcontroller to experiment with it. I know the performance would be lousy doing it that way, but I could experiment to see how practical it is for various kinds of programming, and modify as I find appropriate. I'm really not very interested in merely simulating it (ie, software on a PC), since my greatest use involves realtime non-human I/O for taking data on the workbench, controlling processes on it, etc.. It would connect to actual VIAs, UARTs, etc..

Copying a post of mine from six years ago:
Quote:
I just got a book offer in the mail for a book called, Microprocessor Design-- A Practical Guide from Design Planning to Manufacturing" by Grant McFarland, published in 2006 by McGraw-Hill. 408 pages, ISBN: 0-07-145951-0. It's $80, with free shipping and handling, and 30-day money-back guarantee. You can see a little about it at their website http://books.mcgraw-hill.com/getbook.ph ... &template= but I'll put a little more here that's not on the website. The paper that came says on the front:

Master the basics of microprocessor design the easy way with this hands-on step-by-step guide. Proven microprocessor design crash course keeps your career on the fast track. You get a wealth of tested techniques to help you:

  • Plan for processor design flow and calculate design time and product cost
  • Analyze trade-offs in choosing an instruction set
  • Understand the functional areas of a processor and their impact on performance
  • Construct logic equations required to simulate processor behavior
  • Convert logic design equations into a transistor implementation
  • Produce layout drawings required for fabrication
  • Manufacture integrated circuits
  • Choose the most cost-effective packaging
  • Test and de-bug processors before shipping to customers

The web page above gives the name of each chapter, but here are some more details: (I shortened some things to not have to type so much)

  1. The evolution of the microprocessor
    the transistor
    the IC
    the µP
    Moore's law

  2. computer components
    bus standards
    chipsets
    processor bus
    main memory
    video adapters (graphics cards)
    storage devices
    expansion cards
    peripheral bus
    motherboards
    BIOS
    memory hierarchy

  3. design planning
    processor roadmaps
    design types and design time
    product cost

  4. computer architecture
    instructions
    instruction encoding

  5. microarchitecture
    pipelining
    designing for performance
    measuring performance
    microarchitectural concepts
    life of an instruction

  6. logic design
    overview
    objectives
    intro to hardware description language
    logic minimization

  7. circuit design
    MOSFET behavior
    CMOS logic gates
    sequentials
    circuit checks

  8. layout
    crating layout
    layout density
    layout quality

  9. semiconductor manufacturing
    wafer fab
    layering
    photolithography
    etch
    example CMOS process flow

  10. µP packaging
    package hierarchy
    package design choices
    example assembly flow

  11. silicon debug and test
    design-for-test circuits
    post-silicon validation
    silicon debug
    silicon test

Hopefully it will inspire someone. Doing it in programmable logic would eliminate steps 7 through 10.

I like what Jeff is saying about self-modifying code. Additionally, it's easier to do that if entire memory units (whether bytes, words, etc.) are modified without having instruction bits being mixed in the same memory unit.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 5:20 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
I like your enthusiasm, Ben. One thing that should never be overlooked in a general-purpose design is the possibility of running out of address space for unforeseen future circumstances ... it has bitten many beautifully elegant designs throughout the history of electronic computing, shortening their useful lifespans and cluttering their assembly language programs with awkward stop-gap measures before they were finally relegated to history [*cough* pdp-11 *cough*]. If your design is not intended to be general-purpose, then this warning can be safely ignored.

Happy designing!

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 7:15 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I should add in my encouragement, having neglected to do that in my first response!

I read about efforts by hobbyists to build computers in the 1960s - they had no choice but to make their own CPUs. It was said that those who chose smaller instruction sets were more likely to get as far as a working computer, so bear that in mind. Jeff is quite right about self-modifying code - before index registers became regarded as essential, modifying some code to add an offset to an absolute address was normal.

In industry, we normally pipeline for performance reasons, the same reason that we add more complex instructions and more registers. But in hobby land, we might well be pipelining in order to learn how to do pipelining, and that's a perfectly good reason.

I suspect you've decided on a fixed-length and fixed-cycle-count instructions to keep your control logic simple. That's a good idea.

A small and regular instruction set will be easier to implement, easier to write test cases for, easier to write an assembler and an emulator (and a disassembler, most likely.) That's a very good motivation - because a finished project is much more satisfactory and you learn more from it. Worry about performance later. Likewise, worry about adding complexity later. There are no decisions you can't revisit after your first design is implemented!

Best of luck
Ed


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 11:05 am 
Offline

Joined: Mon Aug 05, 2013 10:43 pm
Posts: 258
Location: Southampton, UK
Can't really add anything useful, since these other guys are far more knowledgeable. But this sounds like an amazing project, and I wish you the very best of luck.

One question. How are you going to implement this? I'm assuming VHDL or Verilog in an FPGA, or are you going to use real hardware?

_________________
8 bit fun and games: https://www.aslak.net/


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 11:11 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
The two main things that will hurt are the memory size and the number of registers. Both are hard to change without making an entirely different processor. But if you're making it just for the fun of making it, neither really matter.

How do you distinguish the implied instructions from the rest? brk and add immediate have the same code. All you have to do is use 00000 to cover all implied instructions, and use the address/immediate field to distinguish them.

You say that you want one cycle per instruction, but there's a lot in the instruction set that will make achieving this very difficult: all those read-modify-write instructions and indirect addressing. If you want to keep that aim, I think you'd have to switch to a register-based design, with memory access restricted to simple loads and stores.

Making every instruction conditional is a good idea when your pipeline is simple (if you can afford the opcode space). It's a bad idea when you want to go super-scalar or out-of-order, but I don't expect you'll want to do either on this one.

Branch instructions hurt the pipeline. MIPS used delayed branches to get around this - a terrible idea from the perspective of making faster versions in the future (it basically exposes details of your pipeline to software, making it hard to change), but great for a one-off. Instead of stalling the pipeline when you see a branch, you allow the instructions that are already in progress to complete. With the MIPS pipeline, that meant one instruction after the branch was always executed.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 12:05 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
For comparison/inspiration, here's a CPU design I did a while ago: http://ladybug.xs4all.nl/arlet/fpga/x18/cpu.v

To keep things as simple as possible, there are only 7 different instruction formats (all 18 bits to match FPGA RAM). Most instructions are single cycle, 16 bit, 16 registers, internal stack (only 16 deep, but can be easily changed).

My project was originally inspired by the J1 Forth CPU here: http://www.excamera.com/sphinx/fpga-j1.html, but I decided that registers were better than a stack :)


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 5:33 pm 
Offline

Joined: Fri Jul 10, 2015 9:53 pm
Posts: 8
Really hope formatting works out here. First time quoting :3
Dr Jefyll wrote:
lookaside wrote:
Majority of opcodes function identically to their 6502 counterparts.
So arithmetic and logical operations rely on the accumulator for one source operand and memory for the other, is that right? You haven't explicitly listed the register set, but apparently you have A and a Flags register, and presumably also a PC ...

Yes! Should have mentioned those. Accumulator, Flags, and Program Counter. All logical/arithmetic operations are in the form A=A+ram[address]/immediate

Dr Jefyll wrote:
Predicated instructions are somewhat unconventional, which to some extent is reason enough to endorse them. The age-old practice of having every yes/no decision render its result by means of a two-way branch is so widespread as to be almost mind-numbing, IMO. Predication will help keep your thinking fresh -- and the fast multiply you mention is an example of what it can do for performance! (Still, if predication, or any other feature, becomes too much of a burden in terms of associated tradeoffs, don't be afraid to drop it from the design.)

Projects like yours give rise to so many related discussions there aren't enough hours in the day to deal with them all. But allow me to refer you to some documents I think you'll find valuable. The first is a wonderfully complete and highly readable roadmap of the decisions you face as you flesh out your new architecture, Creating Embedded Microcontrollers by Ken Chapman of Xilinx. And here are a handful of pdf documents posted by Bruce Jacob of the University of Maryland. I've not been through them all but again the readability is high, despite the fact some of the topic material is non-trivial (eg: out-of-order execution). Jeff

Branch predication is something new and exciting that I haven't played with in any of the CPUs I've designed so far, so I thought it would be interesting to try out. From my experience writing assembly with it so far, code is shorter and the penalties from branch mispredicts are greatly reduced. Thank you for the links. I've actually been through all of Bruce Jacob's site and RISC-16 was the first CPU I ever implemented and wrote assembly for. From that first project stemmed a lot of my inspiration to adapt the processor and try new things. The thread you linked discussing improvements on the 6502 is exactly what I had in mind as a resource to look over.

Dr Jefyll wrote:
I'll throw in two more ideas before I sign off. Firstly, you can shrink the size of your opcodes (and use the bits you save for other things) by defining one or more prefix instructions that cause the following opcode to have an alternative meaning. IOW have an alternative opcode map(s). But of course you'll want to try to keep frequently-used instructions in the main opcode map.Jeff

That's an interesting solution to the small opcode problem. My instruction set currently has 3 formats that are all quite simple, one of which only requiring the addition of a single multiplexer. Adding the feature you described would likely only involve the addition of another instruction format and a latch. I don't think I'm that desperate for more opcode space but it's definitely something to try in a future processor!

Dr Jefyll wrote:
Secondly, CPUs with limited resources are better-than-average candidates for the use of self-modifying code. Widely considered heresy, this superstitiously reviled practice can drastically increase the power of a limited instruction set. So don't view it as a religious debate; what it is is simply another tradeoff for you to evaluate. In the simplest case you'll just modify operands, not opcodes, so the technique needn't involve immense complexity. The main downsides are loss of reentrancy and the need for careful documentation.Jeff

Thanks for bringing that up and getting me thinking about it.


Aslak3 wrote:
One question. How are you going to implement this? I'm assuming VHDL or Verilog in an FPGA, or are you going to use real hardware?

I'm going to design and simulate the processor in a program called Logisim, my favorite simulator out of the dozens I've tried. Although I'm familiar with them, I don't have any developing experience with any of the three tools you mentioned. I am ambitious though and eager to learn so if I can find some strong resources, I'd love to try them out.


John West wrote:
How do you distinguish the implied instructions from the rest? brk and add immediate have the same code. All you have to do is use 00000 to cover all implied instructions, and use the address/immediate field to distinguish them.

That's exactly what I do. Should've made that clearer.

John West wrote:
You say that you want one cycle per instruction, but there's a lot in the instruction set that will make achieving this very difficult: all those read-modify-write instructions and indirect addressing. If you want to keep that aim, I think you'd have to switch to a register-based design, with memory access restricted to simple loads and stores.

I lied :3. Load accumulator indirect does take two cycles :( All other complicated instructions I've been able to implement in a very clean way with careful construction and layout of the pipeline stages.

Arlet wrote:
For comparison/inspiration, here's a CPU design I did a while ago: http://ladybug.xs4all.nl/arlet/fpga/x18/cpu.v

To keep things as simple as possible, there are only 7 different instruction formats (all 18 bits to match FPGA RAM). Most instructions are single cycle, 16 bit, 16 registers, internal stack (only 16 deep, but can be easily changed).

My project was originally inspired by the J1 Forth CPU here: http://www.excamera.com/sphinx/fpga-j1.html, but I decided that registers were better than a stack :)

Thanks for the link. Having no experience in Verilog yet, it's wonderful to have a project as an example to follow and learn from.



Something I should have mentioned is that above all else, this project is for me to learn. This is the 9th (?) CPU I've put together and with every one I've made it a goal to try new things so I could become familiar with a broad range of components and tactics in processor design and how they influence performance. This is my first accumulator CPU, and I hoped to experiment with branch predication as well as branch prediction. Above all else I'm just trying to learn how everything in a CPU interacts and is connected together, so things like address size aren't as important to me. Having the most optimally performing CPU is secondary. Thanks everyone for all the advice, resources, and encouragement.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 21, 2015 7:51 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Another small point: index registers were a great invention back in the day, for simplifying string and array accesses without resorting to self-modifying code or excessive indirection. Having one index register is great, but having two is even better, as you may discover when coding some of the things that microprocessors are expected to do all the time, like copying, appending and comparing.

Here's an example, showing the superiority of two index registers in a common case:

http://anycpu.org/forum/viewtopic.php?f ... p=461#p461

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 22, 2015 1:41 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
barrym95838 wrote:
Here's an example, showing the superiority of two index registers in a common case:

http://anycpu.org/forum/viewtopic.php?f ... p=461#p461

Mike B.

Having a second index register was a significant selling point of the 6502 back in the day.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 22, 2015 7:51 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
On the subject of the flags register, you can do without it if you always test the content of the accumulator for N and Z. A carry bit may be useful, and if you don't have an interrupt mechanism that doesn't even need an easy way to save it. Even if you do, the interrupt routine can save and restore it if it needs to. Overflow is subtle and you might even be better off without it. This is of course pushing simplicity rather than convenience.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 22, 2015 12:18 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
On the subject of instruction formats and self-modifying code, I recommend the first part of this amusing talk by Guy Steele in which he deconstructs a one-card program he wrote 40 years earlier for the IBM 1130:
http://www.infoq.com/presentations/Thin ... rogramming
You might want to let the video play in one window while you browse the slide deck at
https://web.archive.org/web/20130925020 ... rallel.pdf


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron