6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 16, 2024 10:50 pm

All times are UTC




Post new topic Reply to topic  [ 30 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: 65Sim16
PostPosted: Wed Apr 10, 2013 4:57 am 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Lately I've begun playing with a CPU simulator found here: http://www.cs.colby.edu/djskrien/CPUSim/

You can create microcode fragments that can be used to create higher-level instructions. There's a simple built-in assembler to let you test those instructions, plus a debugger that lets you see why your microcode is not behaving as you expect. The hardest thing for me to get used to is the simulator's insistence on numbering bits in a register left-to-right from zero to however many bits it has. I'm used to the bit zero being on the right, not the left.

I've been toying with a model of a 16-bit address space CPU with all 16-bit registers. Sort of the like the 65Org32, except shrunk down considerably in memory space.

Some observations so far:

- TXS, LDA absolute,X can get any value off the stack whereever it is in memory
- TXS, LDA (indirect,X) can get any value pointed to by a stack entry wherever it is in memory, ie., (indirect,X) actually becomes useful
- once you have the microcode for (indirect,X) and (indirect),Y, it's not that hard to cobble together (indirect,X),Y, and with TXS, it's possible to use any stack entry as a pointer for (indirect),Y. This is like the 65816's (indirect,S),Y with an extra TXS but also re-use of existing microcode
- there's no need for zero page or zero page,X addressing, so I replaced those with (indirect) (ie., non-indexed indirect) and (indirect,X),Y. Still only eight address modes
- I haven't implemented it yet, but I'm wondering if LEA (load effective program counter relative address in accumulator) might be more useful than the 65816's PER (push effective program counter relative address on stack). The other two 65816 instructions - PEA and PEI - seem at this point like shortcuts to avoid using the accumulator to load an immediate address and the contents of a memory address, respectively (although perhaps the fact that the pushes are always 16 bits and the registers are not always 16 bits might have something to do with those decisions - something that would not matter in the case of the 65Sim16)
- I haven't figured out a way to do BCD arithmetic. I'm not certain it's possible using this simulator, but I'm not certain it's impossible, either
- there doesn't seem to be a way to get the simulator to simulate external interrrupts, but I think I see a way to implement BRK, at least

I think I've learned enough so far that I can scrap it and start over with a clearer idea where I'm headed. But I do have a question regarding design. How many internal registers, ie., non-exposed, registers should there be?

The simulator is fairly lenient about letting you do whatever you want with any register at any time. Heck, they can all be ALUs, index and memory access at the same time, if you want and are willing to write all the necessary microcode. That doesn't seem realistic to me.

So far I'm using one register as basically an ALU and memory access register, mainly because so many instructions set the N and Z condition flags. I wrote the microcode to do that for one register and then decided it was easier to route any instruction that set flags through that register than to duplicate the microcode for every possible path.

This time I'm imagining perhaps five internal registers: memory address, memory data, ALU, shift and flagsetter. A separate flagsetter seems reasonable because memory reads and register transfers are not ALU or shift operations.

This is easy enough to get the simulator to do, but is it any more realistic than using one register (so far the memory data register) as memory read/write, ALU, shift and flagsetter?


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Wed Apr 10, 2013 6:00 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
For the simulator, I think it's up to you. Whatever you find most convenient.

If you want to run your design on an FPGA, it's probably best to document the ISA (Instruction Set Architecture), and start from scratch from there, since the best implementation won't necessarily have anything in common with a software simulation design.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Wed Apr 10, 2013 7:03 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8541
Location: Southern California
Neat! 65Org32, here we come.

Quote:
- TXS, LDA (indirect,X) can get any value pointed to by a stack entry wherever it is in memory, ie., (indirect,X) actually becomes useful

(ind,X) is used all the time in Forth (and I suspect on other high-level languages too) on the 6502, in the data stack in ZP where X is used as the stack pointer.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Wed Apr 10, 2013 11:10 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
teamtempest wrote:
Lately I've begun playing with a CPU simulator found here: http://www.cs.colby.edu/djskrien/CPUSim/
...
I've been toying with a model of a 16-bit address space CPU with all 16-bit registers. Sort of the like the 65Org32, except shrunk down considerably in memory space.

...

Great stuff!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Wed Apr 10, 2013 7:42 pm 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Now that's cool, I'll have to bookmark it. For the fun of it and to try something completely different, you might want to try to build a stack machine (http://en.wikipedia.org/wiki/Stack_machine). If I ever build my own CPU (ha), that would be what I would try.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Thu Apr 11, 2013 11:43 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Arlet wrote:
For the simulator, I think it's up to you. Whatever you find most convenient.

If you want to run your design on an FPGA, it's probably best to document the ISA (Instruction Set Architecture), and start from scratch from there, since the best implementation won't necessarily have anything in common with a software simulation design.


So a hardware implementation is not mostly a matter of designing circuits to implement a complete microcode instruction set? Ah well, if not at least some hints can be gained by playing around with this.

F'rinstance, whatever circuitry accounts for the sign extension in 65xx relative branch instructions can be dropped, since the full 16-bit offset is available in the "byte" read anyway. I suppose I'm not the first to notice that if a conditional branch isn't taken, there's not need to actually read the offset value, ie., it's possible to just load the memory address register and advance the program counter, then check the condition flag before actually performing the read. Maybe it's just simpler hardware to have all memory access instructions work the same way?

Some instructions go away altogether, like ASL zp and ASL zp,x. I'm not sure why there isn't anything like ASL (zp),Y, since at least at the microcode level the necessary instructions are all there anyway. Something I've occasionally wished for in the past!


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Fri Apr 12, 2013 2:16 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
teamtempest wrote:
...I'm not sure why there isn't anything like ASL (zp),Y, since at least at the microcode level the necessary instructions are all there anyway. Something I've occasionally wished for in the past!

Interesting TT, I would like to try to implement your idea within the 65Org16.b core. I think column 3 would be a good fit. Rows 1,3,5,7 for ASL, ROL, LSR, ROR indirect indexed Y...
The way I see it, the opcode decodings for FPGA would be in an odd column next to the indirect indexed Y opcodes in column 1 for ORA, AND, EOR and ADC indirect indexed Y. As Arlet was alluding to, the bits for opcodes executing a common function should all be even or odd. This helps in the speed(i.e. optimization) of the instruction register decoding.

EDIT: So my question is, for my own derivative of Arlet's core: For example when doing an ASL ($xxxx),y shifting bits out (left) through the LSB, would it set the carry? (hopefully not) until it reached the last bit in the MSB. I will find out. His state machine does not have to be modified... But as you've said many issues like these would be automatically fixed in the 65Org32!

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Fri Apr 12, 2013 6:17 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
teamtempest wrote:
So a hardware implementation is not mostly a matter of designing circuits to implement a complete microcode instruction set? Ah well, if not at least some hints can be gained by playing around with this.

In my 6502, I don't have traditional microcode. There's no actual code, and no micro program counter, or micro program instructions. There's a state machine that's similar in function though, going through a couple of states which are different for each major group of instructions.

Quote:
Some instructions go away altogether, like ASL zp and ASL zp,x. I'm not sure why there isn't anything like ASL (zp),Y, since at least at the microcode level the necessary instructions are all there anyway. Something I've occasionally wished for in the past!

Probably to save opcode space.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Fri Apr 12, 2013 6:37 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
By the way, are you going to use 16 bit instructions as well ? And do you want to have mixed 8/16 bit access, or 16 bit only ?


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Fri Apr 12, 2013 11:59 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Arlet wrote:
By the way, are you going to use 16 bit instructions as well ? And do you want to have mixed 8/16 bit access, or 16 bit only ?


I'm still debating about those issues. Obviously if I decide to create more than 256 basic instructions or combinations of instructions and address modes I'll need more than eight bits to specify them. OTOH originally I implemented a "quick" address mode that stuffed a signed 8-bit value into eight of the 16 bits. So it was possible to load values from -128 to 127 into a register using only one "byte".

That in turn would allow me to drop three other single byte "clear register" instructions - CLA, CLX and CLY - since it would be possible to just do something like "LDA #0" instead (presumably assemblers would pick up on the opportunity to use a "quick" instruction, just as they do with zero page now).

Plus I could use most of the same mechanism (mainly the sign-extension microcode) to implement "quick" or "short" one-"byte" relative branch instructions.

Remember, this design contemplates only a 16-bit address space, so I figured I could justify the space savings!

But a "quick" mode would be a ninth address mode if I also implemented indirect absolute and pre-X and post-Y indexed indirect. Right now I think I'd rather have those, so at present I've taken out all the "quick" instructions. Plus for consistency it might be good to have "quick" instructions for all the relatives of "LDA", but I have a hard time imagining how useful something like "ORA #$20" would be, knowing that the top eight bits would always be either set or clear, depending on the operand value.

So "CLA" and company are back in. Then today I've been mulling over something like "STC #value,addr". I'm not sure I like the syntax, but essentially it would be "STore Constant", a more-capable version of the 65c02's "STZ" (which, apologies to all, I've never really liked). Assembled it would be two "bytes", with a signed 8-bit value and opcode in the first "byte" and an address in the second.

What else could I do with that syntax? Maybe "ROR #value, addr", since multi-bit shifts are going to be useful. Although this leads to "ROR #value,addr,X" - maybe it should be "RO8R addr,X" (or whatever shift is desired) instead. "ROFL addr" for a 15-bit rotate left, anyone? Whatever the syntax, the point is it's possible to store the shift value directly into the assembled opcode "byte" - only four bits are needed for all possibilities - so "ASCL" would be a one "byte" opcode shifting the accumulator left 12 bits. So at least some opcodes will be more than eight bits.

On the access size issue, for the time being I've been thinking only of full 16-bit access. That makes me wonder how easy handling standard ASCII codes in, say, a word-processor would be (ignore eight bits, or use them for format information only? Or pack two codes in one byte so as to maximize document size?), but 16-bit Unicode ought to be quite easy. Maybe a "real machine" would be designed to be a Unicode machine right from the start.

Are you thinking of peripheral access? I've only distantly considered that, as in, I haven't yet tried to figure out how practical that would be. I suppose I could dodge by saying the simulator doesn't really have any I/O to speak of beyond reading and writing single characters, but to the extent that I have considered it, I imagined 16-bit devices only. The thought of program chosen memory access size mostly makes me wonder how I'd implement it. Separate instructions? What would that do to the number of opcodes? A flag bit in the status register? What happens when an interrupt hits?


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Sat Apr 13, 2013 5:39 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I would recommend 8 bit opcodes. It is good for code density (important with only 16 bit memory space), but also means that you'll be restricted in the design. That sounds like a disadvantage, but the design actually becomes easier the fewer choices you have. If you have 8 bit opcodes, that also implies byte wise memory access (otherwise it gets very complicated). With only a 16 bit address space. On the other hand, if you have 16 bit registers, you need to be able to load them quickly with 16 bit values, but you can still do that with two 8 bit accesses.

You could do like the 65816, and have a 8/16 bit mode based on a flag. This avoids having to duplicate all the opcodes for 8 or 16 bit operation. If you do go with 8 bit opcode, I recommend first figuring out which opcodes are essential, and then see how much room is left over. I agree multi-bit shifts are useful, but I don't think you need the full range of addressing modes. A ROR #value would be nice for immediate shifts of the accumulator, and maybe another instruction to shift by X register, so you can do variable shifts.

Edit: actually, just thinking about it some more.... and I think I've changed my mind about the 8 bit opcodes. Memory is cheap, and if the memory is 16 bit wide, you can actually get 64k * 16 instead of 64k * 8, which means the code (or data) density argument is irrelevant. 16 bit memory does offer twice the bandwidth, and allows you to get rid of all the zero page addressing modes.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Sat Apr 13, 2013 11:44 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Given the fact that X and Y will be 16 bit, it makes sense to get rid of (zp), Y and (zp, X) addressing. Instead, how about the following addressing modes:

1) imm8 : 8 bit immediate operand as part of the opcode ("quick")
2) imm16: 16 bit immediate operand following opcode
3) zp: absolute, using 8 bit quick address.
4) abs : absolute, using 16 bit address following opcode
5) abs, X: EA = X + 16 bit offset
6) abs, Y: EA = Y + 16 bit offset
7) abs8, X: EA = X + 8 bit offset
8) abs8, Y: EA = Y + 8 bit offset

The abs8, X (really zp, X) could also be written as (X, #offset) to indicate the intended use as X as a base pointer to some data structure, and a small offset. Note that in the case of #2, #4, #5 and #6 you have 8 unused bits in the opcode. These could be defined as extra opcodes, or for instance indicate an autoincrement/decrement, or specify additional index registers (or a combination). Without (zp), Y, it would be useful to have a few more index registers as replacement.


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Sat Apr 13, 2013 7:54 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Quote:
Given the fact that X and Y will be 16 bit, it makes sense to get rid of (zp), Y and (zp, X) addressing. Instead, how about the following addressing modes:

1) imm8 : 8 bit immediate operand as part of the opcode ("quick")
2) imm16: 16 bit immediate operand following opcode
3) zp: absolute, using 8 bit quick address.
4) abs : absolute, using 16 bit address following opcode
5) abs, X: EA = X + 16 bit offset
6) abs, Y: EA = Y + 16 bit offset
7) abs8, X: EA = X + 8 bit offset
8) abs8, Y: EA = Y + 8 bit offset


No indirect modes at all? Wouldn't that mean, for instance, that it would be impossible to write a single "move memory" routine without resorting to self-modifying code? I'm not a big fan of that sort of thing except in very limited, well-controlled circumstances.

64K 16-bit "bytes" amounts to 128K octets. As you say, code density is a consideration but possibly not the real constraint. In practice I've been modelling 8-bit opcodes in 16-bit bytes. For the most part when I think about the other eight bits I imagine some sort of constant operand value, but that's not set in stone.

The two-"byte" address modes I've implemented in this iteration are:

1) imm16 - 16-bit immediate
2) absolute - 16-bit absolute address
3) absolute,X - absolute address + 16-bit X offset
4) absolute,Y - absolute address + 16-bit Y offset
5) (indirect) - absolute indirect
6) (indirect,X) - absolute pre-indexed X indirect
7) (indirect),Y - absolute post-indexed Y indirect
8) (indirect,X),Y - absolute pre-indexed X indirect followed by post-indexed Y-indirect

Since there is no zero page all of the indirect modes accept any 16-bit memory address.

The stack pointer is also 16-bits, so TSX makes the following possible:

1) absolute,X - get any 16-bit value off the stack
2) (indirect,X) - get a 16-bit address off the stack and fetch the value at that address
3) (indirect,X),Y - tricky: get a 16-bit address off the stack (pre-index) and use it as a base address (post-indexed). Equivalent to LDA absolute_1,X; STA absolute_2; LDA (absolute_2),Y

And of course JMP (indirect,X) has been with us for some time.

I do like the "quick" modes but stalled over the thought of more than eight address modes. But come to think of it, there are already more than eight. I discarded zero page and zero page,X but substituted two new modes for them. There is still implied mode, such as for register transfers, so there always have been at least nine modes. Hmm. Since implied mode is always one byte, maybe the "quick" modes can be salvaged the same way, since they would also be one "byte".


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Sat Apr 13, 2013 7:57 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Gah! And of course branches are relative...so a ninth two-byte adress mode...


Top
 Profile  
Reply with quote  
 Post subject: Re: 65Sim16
PostPosted: Sat Apr 13, 2013 8:50 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
teamtempest wrote:
No indirect modes at all? Wouldn't that mean, for instance, that it would be impossible to write a single "move memory" routine without resorting to self-modifying code? I'm not a big fan of that sort of thing except in very limited, well-controlled circumstances.

You can still do indirection through the X and Y registers. Load the source in X, load the destination in Y, and LDA 0,X; STA 0, Y. The advantage is that you avoid the double memory access through the zero page. And with the 8 bit quick offset, you don't need any extra operand word either. This means that the instruction could be done in only 2 cycles.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 30 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: