6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Apr 29, 2024 1:16 am

All times are UTC




Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Some 65Org16 questions
PostPosted: Wed Jun 27, 2012 4:29 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Elsewhere, Miles J. joined the forum ...

Miles J. wrote:
Hello,

sorry for interfering. Hope you don't mind. While looking for some information about the 6502, I stumbled across this interesting forum and found your threads about the 65org16 and other related topics. I was truly amazed by the knowledge and experience of the people here and hence their ideas, and I digged through the depth of the threads to gather as much information as possible. And finally at the weekend I sat down and wrote a little experimental 65org16 emulator and assembler to fool around with the opcodes (*) and to get a feeling what programs and programming would be like. As expected a few problems arose, and now I wonder if it were possible to ask you some questions concerning the 65org16/65org32. Well, actually, I would have plenty of questions, so I didn't even know where to post this since the questions include things like emulation as well as programming or ISA design. Feel free, please, to move this post around to another thread if you like. Oh, and this is my first post here. Have mercy on me, please...

Miles

(* right now only the old 65c02 instructions extended to 16 bit without the additional registers)



Miles J. wrote:
Dr Jefyll wrote:
Hi Miles! -- and welcome! Nice to hear about your emulator and assembler. Did you have an actual question? I'm sure someone will be glad to try to answer.

cheers,

Jeff

Thanks! I've got a lot of questions, indeed, but I really don't want to cause any trouble. So, here are just a few:
1) A simple question just to make sure: is the behaviour of the stack instructions PHA, PLA etc and JSR, RTS the same as on the 6502 (PHA = write value then decrement S, PLA = increment S then read value, JSR = increment PC, write high word, decrement S, write low word, decrement S)?
2) How many cycles do the instructions use (relative in comparison to each other)? Just a rough estimate, so that I can adjust the emulator and watch out for bottlenecks in the program.
3) Is the stack placed at $0001 0000 .. $0001 ffff? The IRQ vectors are found at $ffff fffa etc? Is there an additional COP vector ($ffff fff4) or something similar that can be used for system calls?
4) Does a dev. board exist with an attached video generator? If so, how is it organized and what is the resolution? (So I can perhaps emulate some kind of display output.)
5) Would it be possible to add the instruction 'JSR (zp, x)'? This may seem very odd at first, but there's a reason. When I write larger programs I use object oriented programming a lot. In x86 assembly you would likely use something like 'mov eax, [esi + offset]' for loading an object attribute value. On a 68000 there is 'MOVE.L offset(A5), D0'. Unfortunately an addressing mode like 'LDA (zp), #offset' does not exist, and addressing attributes using 'LDY #offset : LDA (zp), y' would result in very long code. So I decided to place the objects into the zero page, load the X register with the address of the object and use the zp value as the constant offset of the attribute. This will give me 'LDA offset, x' for loading the attribute value of an object pointed to by X. So far so good. The problem is: how do you call an object method? The x86 has 'call [esi + offset]' for this. The 68000 at least got 'JSR offset(A5)' in combination with an additional ' JMP abs.L'. All members of the 6502 family lack a short and quick way to perform this call, although it is really very useful and may speed up programs drastically. Now even a zp as the offset would be enough, as the objects reside in zero page anyway. Hence 'JSR (zp, x)' would be fine. What do you think: would it be possible/make sense to add such an instruction?
6) Are there still plans to include instructions for multiplication and division? (I personally would prefer a simple unsigned 'MUL' that multiplies A * Y with the result in Y (low) and A (high) but that's a matter of taste really.)
Now I could go on endlessly, but I rather not get on your nerves if I haven't done so already. Thanks for your patience.

Miles


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 27, 2012 4:38 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3349
Location: Ontario, Canada
OK, Miles, I'm not exactly an authority on this project, but I can clarify the first few points at least.
Quote:
1) A simple question just to make sure: is the behaviour of the stack instructions PHA, PLA etc and JSR, RTS the same as on the 6502
The same, yes. After all, the predecessor to this project is a 6502 (implemented in FPGA). The only major change was to stretch every "byte" to be 16 bits instead of 8. Byte-wide registers such as A become 16 bit, and the 16 bit PC becomes 32 bit. But the behavior is the same as the FPGA 6502 before the byte stretch -- and that behavior replicates the 6502.
Quote:
2) How many cycles do the instructions use (relative in comparison to each other)? Just a rough estimate, so that I can adjust the emulator and watch out for bottlenecks in the program.
I don't know if the FPGA 6502 has exactly the same cycle counts as an actual 6502. But if a rough estimate is all you need then just use actual NMOS 6502 cycle counts.
Quote:
3) Is the stack placed at $0001 0000 .. $0001 ffff? The IRQ vectors are found at $ffff fffa etc?
Yes I think you get the idea. As with question (1): the behavior is the same; it's just that the bytes are wider.

Quote:
Is there an additional COP vector ($ffff fff4) or something similar that can be used for system calls?
You need to be aware there are a few variants to the 65Org16 project, and they explore different ways the Instruction Set could be improved. There's certainly a lot of potential there, since opcodes are now 16 bits -- 8 of which are new and aren't tied to any pre-existing function!

As for system calls, we can have 256 different BRK opcodes (not to mention the fact that the "signature" byte following the BRK opcode is now 16 bit...) :D

Quote:
I personally would prefer a simple unsigned 'MUL' that multiplies A * Y [...] Now I could go on endlessly
As you may have noticed, there are a bunch of different threads pertaining to this project and its variants -- and in different Forum categories too. I'm afraid there's rather a lot to take in! But (as with any forum member) if your remarks are pertinent and thoughtful then by all means please share them with us!

-- Jeff
ps-
JSR (zp, x) and that whole line of thought has a distinctly Forth-ish flavor to it. Although your reading list is already rather long, perhaps you should be browsing the Forth section of the forum, too...

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 27, 2012 5:00 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Hi Miles - welcome!
Always good to see a new face on the forum, and especially good to have someone turn up to the 65Org16 party.

There's a summary of threads here which I try to keep up to date - it doesn't address all your questions, but Jeff has done a fine job to get us going.

The 65Org16, for me, is an idea, and we have a particular implementation family which is based on Arlet's 6502 core. That core is, I think, cycle-count accurate with the NMOS 6502 (but swaps some memory accesses around.) There's nothing preventing anyone extending another core to 16bits though, if they had an idea to.

I prefer to keep "65Org16" to refer to a core which keeps to the NMOS instruction set, because there are so many ways we could extend it, and there's no reason to suppose that they will all be in compatible ways. So, 65Org16.b is one exploration of extensions, which EEye has in a public repository. There is also a not-quite-working branch of my repo which added an unsigned multiply instruction and a long-distance shift. 65Org16.c was named but not quite defined.

One counterpoint to Jeff's observation: I wouldn't quite say we have 256 BRK instructions. The instruction decoding is loose in the original verilog project, but the assemblers will always place a zero high byte. BRK is, properly, $0000. We do have a full 16-bit "byte" of operand to follow. (EEye's project has got exact instruction decoding, so his core only recognises $0000 as BRK. I think that's the right definition of the instruction for the architecture, but I think it's OK for an implementation to be loose or not.)

Miles J. wrote:
4) Does a dev. board exist with an attached video generator? If so, how is it organized and what is the resolution? (So I can perhaps emulate some kind of display output.)

I've been running on OHO FPGA modules, which have no video, but EEyE has designed and built a dev board, and put one in Arlet's hands. That board has external SDRAM, video, and other peripherals.

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 27, 2012 5:34 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
A little more on the specific enhancement ideas:
Miles J. wrote:
5) Would it be possible to add the instruction 'JSR (zp, x)'? [for object methods ... other CPUs have something...] All members of the 6502 family lack a short and quick way to perform this call, although it is really very useful and may speed up programs drastically. Now even a zp as the offset would be enough, as the objects reside in zero page anyway. Hence 'JSR (zp, x)' would be fine. What do you think: would it be possible/make sense to add such an instruction?

The general answer would be "yes" because anything is possible! (Seriously - FPGAs are big and HDLs are pretty much like software.) However, that's a bit flippant because there's the question of how difficult it is, and whether anyone will do it. So far both EEye and I have stopped short of adjusting the shape of the state machine (I know I have, and I think he has) - it's one reason why we have the 65Org16, is that it's a very minimal change to an existing core. I wouldn't ever have built a core from scratch. Every time I look at the state machine I run away again. So, adding the necessary states to handle JSR (zp, x) isn't something I'm likely to tackle. A more confident practitioner wouldn't find it too challenging. (It's possible that a microcoded core such as MichaelM's would be more amenable to my kind of brain.)
Quote:
6) Are there still plans to include instructions for multiplication and division? (I personally would prefer a simple unsigned 'MUL' that multiplies A * Y with the result in Y (low) and A (high) but that's a matter of taste really.)
A simple MUL can be dropped in, I think. Division seems to be difficult, in the sense that it's a multi-cycle operation and there's no drop-in hardware for it, so that's less likely unless someone pops up. A division-step instruction might be more likely.

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 27, 2012 5:46 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
BigEd wrote:
That core is, I think, cycle-count accurate with the NMOS 6502 (but swaps some memory accesses around.)

Correct. The cycle count is identical to the NMOS 6502, but the FPGA version has been modified to use a single, positive-edge clock signal, according to modern design practices. Also, the memory access takes a full cycle, allowing the use of synchronous FPGA block RAMs. This means that there are some differences in the way the bus is accessed between the FPGA and the NMOS 6502.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 27, 2012 11:19 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Miles J. wrote:
...3) Is the stack placed at $0001 0000 .. $0001 ffff? The IRQ vectors are found at $ffff fffa etc? Is there an additional COP vector ($ffff fff4) or something similar that can be used for system calls?...Miles

The .b core I've worked on has relocatable stack and zero pages. It's accomplished by 4 primary opcodes that transfer a value to/from an accumulator to the stack/zeropage pointer. Each of these 2 64K pages can be anywhere in the 4GB address space.

Miles J. wrote:
...4) Does a dev. board exist with an attached video generator? If so, how is it organized and what is the resolution? (So I can perhaps emulate some kind of display output.)...Miles

Right now the current Devboard has a display IC that takes YUV style input and outputs to composite or s-video. I don't like this, however. The next board I make, and hopefully with Arlet's help again, I am planning on a simpler 16-bit interface to a 24-bit video DAC. Arlet's current video memory controller has the pixels go left to right then down 1, like a crt raster beam, and I don't think this behavior will change when I build the new board. Hope this helps for your emulator efforts!

Miles J. wrote:
...6) Are there still plans to include instructions for multiplication and division? (I personally would prefer a simple unsigned 'MUL' that multiplies A * Y with the result in Y (low) and A (high) but that's a matter of taste really.)
Now I could go on endlessly, but I rather not get on your nerves if I haven't done so already. Thanks for your patience.Miles

I would say yes, in a future version by me. Or maybe sooner by anyone else!

BTW, Welcome to 6502.org. :D

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 27, 2012 11:59 pm 
Offline

Joined: Tue Jun 26, 2012 6:18 pm
Posts: 26
Thanks for the answers and the links! You're very kind. I promise to read them thoroughly ASAP.
BigEd wrote:
A simple MUL can be dropped in, I think. Division seems to be difficult, in the sense that it's a multi-cycle operation and there's no drop-in hardware for it, so that's less likely unless someone pops up. A division-step instruction might be more likely.
Okay, no problem. For now I will use a little subroutine for division when needed. MUL would be more useful anyway, I guess (e.g. for calculating the pixel address inside a window with arbitrary size).
BigEd wrote:
So, adding the necessary states to handle JSR (zp, x) isn't something I'm likely to tackle.
Understandable. So no chances then for the 65816 instructions 'JMP (abs,x)' and 'JSR (abs,x)'? :) Any hopes for 'BRA' and 'BSR'? :) 'PSH/PLL zp'? :) Just kidding. Don't get me wrong, please, I admire your work and I can see the problem. Oh dear, I wish I was able to do a little bit of FPGA programming myself, but I'm too dumb for this. No experience at all. Wouldn't even now how and where to start. :( My apologies for bugging you.
Arlet wrote:
The cycle count is identical to the NMOS 6502
Done that. How about the additional clock cycle on page boundary crossing (e.g. 'LDA abs, x' and also 'Bcc')? Still the same?
ElEctric_EyE wrote:
The next board I make ..
Wish I had one of those. Good luck with it!

Finally (?) there is one question I'd like to ask regarding the extended register set:
Can I assume for now that registers A, X, Y, and S correspond to r0, r1, r2, and r15 respectively?

I'd also like to add that I do not expect any specification or detail mentioned here or in other threads to be set in stone. This is a work in progress, and the good thing about FPGAs and emulators is that you can change them again and again. :)
So, for the next couple of days, I will be working on a new version of the 65org16 emulator, this time written in x86 assembly. (I've noticed the old one being too slow.) A resolution of 320x200x16 will function as a test output. In addition I've also started converting some basic standard kernel routines to 65org16, but I don't know yet how far I can get.

Thanks again for your help.

Miles

BigEd wrote:
Please consider posting something in the Introduce Yourself thread
Uhm, well, I'm usually rather shy. Let's see...


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 2:38 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Miles J. wrote:
BigEd wrote:
A simple MUL can be dropped in, I think. Division seems to be difficult, in the sense that it's a multi-cycle operation and there's no drop-in hardware for it, so that's less likely unless someone pops up. A division-step instruction might be more likely.
Okay, no problem. For now I will use a little subroutine for division when needed. MUL would be more useful anyway, I guess (e.g. for calculating the pixel address inside a window with arbitrary size).

Don't forget you can use the big math tables at http://wilsonminesco.com/16bitMathTables/index.html, if you can spare the I/O for them, or load them into RAM. One of the math tables I provide is for inverting, so to divide, you can multiply by the inverse. The input number is 16 bits, and the inverse is 32-- not that you have to use all 32, but it lets you get 16-bit resolution and accuracy across the entire range.

For fast multiplication, you can speed it up with the multiplication table which goes to 255x255, or perhaps better, the table of squares which has 16-bit input and 32-bit output, and consider that:

(a+b)² = a² + b² + 2ab

so if you solve for a*b, the multiplication becomes:

ab = ( (a+b)² - a² - b² ) / 2

meaning it is reduced to an addition, three squarings (from the table), two subtractions, and a right shift.

These two particular tables are unsigned.

There are other tables there for trig, log, and square-root functions. In some cases, using the big look-up tables makes it nearly a thousand times as fast as having to actually calculate them, and all 16 bits of the answer will be correct.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 3:56 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Miles J. wrote:
Thanks for the answers and the links!
You're welcome. I've wandered away from this topic a bit. A bit of fresh interest might help bring me back!
Quote:
BigEd wrote:
So, adding the necessary states to handle JSR (zp, x) isn't something I'm likely to tackle.
Understandable. So no chances then for the 65816 instructions 'JMP (abs,x)' and 'JSR (abs,x)'? :) Any hopes for 'BRA' and 'BSR'? :) 'PSH/PLL zp'? :) Just kidding. Don't get me wrong, please, I admire your work and I can see the problem. Oh dear, I wish I was able to do a little bit of FPGA programming myself, but I'm too dumb for this. No experience at all. Wouldn't even now how and where to start. :( My apologies for bugging you.
Not at all! I had thought I might be able to mark the birthday of the first implementation by tackling the 65Org32 and finally cracking open the state machine, but that anniversary has passed. I may yet be the first to try it though - it isn't rocket science, it just needs a clear head and a methodical approach.

Anything with the same sequence and cycle count as an existing instruction is likely to be straightforward - so because JMP (abs, x) is the same cycle count as JMP (abs) and because the destination address already passes through the ALU, adding X is probably straightforward. Whereas JSR (abs,X) takes the '816 all of 8 cycles so you can tell at once there will be state machine changes needed.

(Part of the point here is that Arlet's core is small and neat because it follows the same architecture as the original 6502, and therefore the easy enhancements made to the 65c02 will probably also be easy for Arlet's core, and therefore for the 65Org16. PHX, PHY and so on should be very easy, also BRA. On the other hand, to modify the core you have to think like a 6502 designer.)

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 5:42 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
BigEd wrote:
Anything with the same sequence and cycle count as an existing instruction is likely to be straightforward - so because JMP (abs, x) is the same cycle count as JMP (abs) and because the destination address already passes through the ALU, adding X is probably straightforward.

The JMP (abs,X) takes 6 cycles, while JMP (abs) takes 5 cycles. Your idea of adding the X is correct, but you'll have to add an extra state to add in the carry.


Last edited by Arlet on Thu Jun 28, 2012 5:57 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 5:44 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Miles J. wrote:
Done that. How about the additional clock cycle on page boundary crossing (e.g. 'LDA abs, x' and also 'Bcc')? Still the same?

Yes, although I have experimented with a version that avoids some of the extra cycles. That version hasn't been published yet, though.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 7:44 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Arlet wrote:
BigEd wrote:
Anything with the same sequence and cycle count as an existing instruction is likely to be straightforward - so because JMP (abs, x) is the same cycle count as JMP (abs) and because the destination address already passes through the ALU, adding X is probably straightforward.

The JMP (abs,X) takes 6 cycles, while JMP (abs) takes 5 cycles. Your idea of adding the X is correct, but you'll have to add an extra state to add in the carry.

Drat! (and thanks for the correction) (but it's fine on the 65Org32!)


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 8:33 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
... (but it's fine on the 65Org32!)

So you've tested JMP(abs,x) in simulation successfully for the 65Org32?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 8:41 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
no, I'm making a wild guess, and I may well be wrong again... but it's because there's only one byte in an address so no need for a carry.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 28, 2012 8:52 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ah, ok. I thought you might be doing some clandestine testing! I might have a hack at it myself though. If I have any luck, I'll update on the .c core thread. Not doing much here at work, and can't design boards or work on the devboard here, so what the heck...

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 36 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: