6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 1:06 pm

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Aug 18, 2011 5:51 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Elsewhere:
teamtempest wrote:
BTW, I want that list to include at least "BSR" and "BRA". I don't care if "BSR" takes an extra cycle compared to "JSR", what I like is easy relocatability.


Just to recap, the 65Org16 is a cpu core, defined as being very like a 6502 but with 16-bit bytes so it has a 32-bit address space. We presently have an implementation in verilog based on Arlet's 6502 core which has a 6502 instruction set, working in simulation and on a Spartan 3 FPGA. Electric_Eye is working on a dev board. There is no emulator(*). There are two assemblers, by teamtempest and BitWise.

The verilog code exists firstly as a fork on github of Arlet's core. So, to make variations on a theme, there are at least three ways to go:
- make another fork, which anyone is free to do
- make a branch within a fork
- add some `defines within the verilog for optional features

Electric_Eye has in fact taken a fork for the dev board work.

The 65Org16 is already in a branch, to distinguish it from the plain 6502 core.

We already have some `defines, so that the verilog can optionally do BCD, and can produce either 6502 or 65Org16. Maybe one day also 65Org32.

If I were to add BRA (easy) and BSR (slightly less easy) I'd probably start with a branch, and I'd do it using a `define so the extra features would be optional. If that code was tidy enough I'd then merge the branch into my main branch.

I'm happy with these two as suggestions for enhancements: I've added them as tickets on github.

In general, I'm comfortable with adding easy-looking 65C02 instructions (PHX and PHY seem like good candidates too) - I'm also comfortable with there being several forks from active developers, because I'm aware I'm not very active. I've shown that 65Org16 is possible, and the original spec is already complete. There are so many ways it could be extended, and I don't think it's possible to keep a very tight rein on all possible directions.

Note that the core is LGPL licensed - you can take it and use it privately as you like, but if you redistribute then you must also redistribute the source.

Cheers
Ed

Edit: (*) there is now an emulator.
viewtopic.php?t=1907
viewtopic.php?t=1982


Last edited by BigEd on Fri Mar 30, 2012 6:20 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2011 6:02 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
BigEd wrote:
If I were to add BRA (easy) and BSR (slightly less easy) I'd probably start with a branch, and I'd do it using a `define so the extra features would be optional. If that code was tidy enough I'd then merge the branch into my main branch.

I'm happy with these two as suggestions for enhancements: I've added them as tickets on github.

In general, I'm comfortable with adding easy-looking 65C02 instructions (PHX and PHY seem like good candidates too) - I'm also comfortable with there being several forks from active developers, because I'm aware I'm not very active.


The absolutely ideal development pattern would go like this:
    - we completely agree on an idea
    - we entirely agree on how to code in verilog
    - someone forks my master project, codes up a feature
    - I pull their code into my master fork


(It's possible that some other more active fork takes over my fork as the better master, but with the same ideal story.)

If the various forks diverge a lot, then we can't merge our different efforts. It might already be difficult for me to accept updates from Arlet, and unpalatable for him to accept code from me.

I'm not sure how the verilog will look with lots of conditional code marked out with `defines. It might be unmaintainable. (It's already the case that EEye tidied up his fork by removing the BCD entirely.)

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2011 5:45 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
... We presently have an implementation in verilog based on Arlet's 6502 core which has a 6502 instruction set, working in simulation and on a Spartan 3 FPGA...

Are you still using your GODIL board? Are those parallel flash chips on either side of the Spartan 3E?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2011 6:59 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
ElEctric_EyE wrote:
BigEd wrote:
... We presently have an implementation in verilog based on Arlet's 6502 core which has a 6502 instruction set, working in simulation and on a Spartan 3 FPGA...

Are you still using your GODIL board? Are those parallel flash chips on either side of the Spartan 3E?

Hi EEye,
no (not for this particular project), and no! In fact the chips on the GODIL topside are 24-way bidir level converters. For this project I'm using a OHO GOP module, which is a 24-pin 5v module in this case with spartan3 and an 8-bit wide RAM. There's a range of GOP modules.

Edit: add GOP photo
Image

The GOP has less I/O than the GODIL but has the advantage of the 512k on-board RAM. It is only 8-bit wide, so for 65Org16 or 65Org32 we (I) will need to sort out at least a buffer if not a cache - it's also only 55ns access. I think having a reusable cache design for a 6502-like core would be very good generally!

Both the GODIL (40 or 50pin) and GOP (20 or 24pin) have on onboard SPI flash but no parallel flash. One could make an interface on the FPGA to offer a parallel port to read the flash (slowly)

Cheers
Ed


Last edited by BigEd on Sat Aug 20, 2011 7:39 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 20, 2011 5:50 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
Feel free to have a look at my 65k specs for inspirations for new instructions. I'd be honored if you take some ideas from it.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Aug 21, 2011 12:44 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Cheers, that's a good idea. I like your encoding table here, both for the ideas and the presentation.

In the case of 65Org16 we have a 16 bit opcode, so we might consider the top octet as a free prefix or quick operand. Using it in a simple way rather than a dense and efficient way appeals to me.

There are many possible directions to go in - and I encourage forks on github - but for me a couple of motivations are attractive:
    - convenience for the assembly language programmer
    - easier target for a high level language

Also, I suppose, point accelerations, such as multiply, or parity, or a CRC operation.

I don't expect to write a lot of verilog though, so I'm unlikely to make something complex.

Also, I'm inclined to keep some 'master' version of the CPU as a pure NMOS-like instruction set. That will keep it small and fast, and with least baggage if someone wants to take it in a different direction.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 27, 2011 12:30 am 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Clearly more ideas are needed! I know nothing about hardware, so I have no idea how hard any of these would be, but I keep thinking "16 bits=65536 potential instructions...."

1) true SKIP byte and SKIP word instructions. Ie. they really do nothing but increment the PC (by two and three bytes, respectively). No side effects to worry about and we get to keep using the "multiple hidden entry points" idiom. Call them NOP2 and NOP3 if you like, or, hmm, make NOP take an argument octet, store it in the top octet and increment the PC by that count when executed

1a) the difference with BRA would be that this would fit in one byte (memory's not really an issue per se, but if branches can only go 32K one way or the other, it's still nice to use it wisely)

1b) no, I have no idea what a large skip value would be useful for, but they're sort of "free" if values of one, two and three are implemented this way anyway

2) JSR (abs), JMP (abs,x) and JSR (abs,x)

3) CLA, CLX, CLY and SEA, SEX (unfortunate, that) and SEY - to set the contents of a register to all zero or all one bits. I'd rather have these than, say, STZ because I do these all the time, and if I also want to store the value in memory I can do that in more ways than STZ allows

4) Of course CLA and SEA are just examples of 'special' constants. Maybe QLA ('Quick Load A'), as in QLA #3 (or whatever). Top octet holds the 8-bit value to load and sign-extend to 16-bits. If these were available CLA and SEA would not be needed, or more likely I would have an assembler treat them as built-in synonyms for QLA #$00 and QLA #$FF

These are some of what I think of as the easier ones, anyway. I can think of harder!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 27, 2011 2:51 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Quote:
1) true SKIP byte and SKIP word instructions. Ie. they really do nothing but increment the PC

Is the objective just to save a word?  The branch instructions are already the minimum 2-clock time if not taken, and usually 3 if taken.  The skip-type instructions are something that I find makes the PIC microcontrollers much harder to program though, with the backwards logic, "If this is true, then don't do such-and-such."

The 6800 had a CLA, but it's not any faster than LDA #0 which is already the minimum 2-clock time.  I like STZ and wish there were an STF also for setting memory bytes to FF, usually for flag variables which will be branched on later.  What I do sometimes is to DECrement the variable if I know I won't be doing it so many times that it will clear the high bit or zero the whole byte, then branch on whether it's zero or not.

For extra instructions, I would start with just making it an extension of the 65816.  There are a lot of instructions and addressing modes there that have already been thought out well and evaluated against how much silicon real estate they take and how they affect maximum clock speed, and won the "contest" to make it into the '816.  BBS, BBR, SMB, and RMB could be added too, and be useful for more than just 256 addresses.

viewtopic.php?t=44 may be of value too.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 27, 2011 5:26 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I suspect that adding in the 65c02 extras will be easiest to code. (The '816 is not a good superset for me - too complex, a huge amount of work. Anyone else is welcome to try it.)

As Garth says, the cycle counts are what they are, even if one can get an advantage in density. Much of the benefit of starting from a 6502 base is inheriting an existing sequencer and not having to modify or extend it.

So, for example, for density purposes it's tempting to put small operands into the high byte of the opcode - this would be a particular choice for a fork - so small-distance branches become single byte, loading and addition of small constants become single byte. But:
    - cycle counts might well be the same
    - the sequencing changes might be more effort than expected
That is, a no-op operand fetch cycle would probably be the least-effort way to code this up. Even with the operand available at fetch time, we still have
    - instruction fetch
    - decode (and optionally operand fetch)
    - execute 1
    - execute 2 (optional)
and so on. That is, the decode cycle presently hides the operand fetch cycle, where there is one.

I'm glad you brought these up though - because assembler support is one of the costs of proposing additional instructions! Especially when it comes to packing more information into the opcodes in interesting ways. How expensive would quick operands be to support?

The usual sequence of development assumes that there's one definition of the instruction set, so you thrash that out first, with an eye to the encoding. Then you finalise the encoding, and then you can code up the hardware and then the software.

In this case, we can have several forks and therefore several definitions of the instruction set. One might like to dedicate two bits to predication and three bits to extend the register set while another dedicates 7 or 8 bits to quick operands. A third fork might do the extra work of trying to do both - and it might be much more work.

Each fork would need an assembler. This is where a macro-driven approach might be a win, if it makes it easy for anyone to extend. Bitwise, for example, might well not find the motivation to code up a new variation that had only a single user.

And that's the downside of this picture: fragmentation. Anyone has their own variant of the CPU. But so far, we have nothing to worry about - no forks. (In itself, a single-use fork isn't a bad idea: a chess machine fork for example.)

My own position is likely to be that I'll add very little indeed beyond the NMOS 6502, for a couple of reasons:
- we need a single well-defined core at the centre of the fork cloud.
- there are other interesting things to do, like looking at caches or 65Org32, or software.
- it's far too easy to suggest new instructions or tweaks to the architecture - it's enjoyable and has some educational value, but much better to write some code and see an implementation.
So, by not extending my own fork much beyond NMOS 6502, I mean to encourage other contributors to fire up their editors on cpu.v!

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 27, 2011 12:39 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
You can count on me to contribute to this thread when I start putting some of my simple ideas into action on the 65Org16.b!

Some of the more simpler ones I plan to start with are the PHX, PLX, PHY and PLY. These are anticipated to be the easiest. But WHEN? That is the question...

Also, I would like to experiment on trimming some other opcodes out, just to see the net effect on an increase of core speed. This I could probably start on now on my spare time.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Aug 29, 2011 5:58 am 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Quote:
How expensive would quick operands be to support?



Huh. I thought I'd replied to this, but I don't see it anywhere. So a recap:

If we use immediate syntax, QLA might look like this:

Code:
QLA #$80


Arbitrarily choosing $CB as the opcode for QLA simply for example purposes, a macro to implement it might look like this:

Code:
.macro QLA ?expr
.if "?expr" ~ /^#/
]expr$ ="((" mid$("?expr",2) ")<<8)+$CB"
.ubyte val(]expr$)
.else
.error "Bad adress mode"
.endif
.endm


This makes sure there is a leading "#" character on the supplied expression, strips it off and creates another expression that has the effect of evaluating the supplied expression, shifting it to the high octet and putting the opcode in the low octet. That is all done by string manipulation with a string result, so "val()" is used to evaluate that string.

The ".ubyte" pseudo op makes sure the final result in the range $0000-$ffff, which in turn makes sure the supplied expression is in the range $00-$ff (if it's larger the shifted result will be larger than $ffff). If you don't care about that, use ".byte" and no error will occur.

A fancier LDA instruction macro could check any immediate operand. If it's value could be determined to be in the range $00-$ff, it could automatically use QLA instead. The relevant sub-portion of that macro might look like this:

Code:
]expr$ = mid$("?expr", 2 )
.if forward(]expr$) || val(]expr$) > $ff
.byte $69
.else
]expr$ = "((" ]expr$ ")<<8)+$CB"
.endif
.ubyte val(]expr$)


If the expression can't be evaluated (ie., it contains a forward reference) or if it can but the result is larger than $ff, store a "load accumulator immediate" opcode. Otherwise adjust the expression to pack the quick opcode and value in one byte. Evaluate and store the expression, and either way make sure the result is in the range $0000-$ffff.

And these are probably better than what I attempted to post the first time anyway. In any case, I guess the answer to "how hard?" is "not very" at the assembler level.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 29, 2011 8:49 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
teamtempest wrote:
A fancier LDA instruction macro could check any immediate operand. If it's value could be determined to be in the range $00-$ff, it could automatically use QLA instead.
This would be ideal in my view: much like an absolute address can be promoted to a zero-page reference.

Quote:
In any case, I guess the answer to "how hard?" is "not very" at the assembler level.
Great! Thanks for investigating.

One cheesy idea I had for encoding is to use the same opcode, but if the high byte is non-zero then it's an immediate operand. (A test for non-zero over 8 bits is probably not expensive in FPGA - there's one way to find out.)

Of course, that cheesy idea displaces a lot of other clever ideas for packing the opcode space. But it's an idea for an experiment. (One could back off a bit and only allow 4 or 6 bits for the immediate, leaving 2 or 4 for other purposes, like register selection or predication.)

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 29, 2011 9:18 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
BigEd wrote:
teamtempest wrote:
A fancier LDA instruction macro could check any immediate operand. If it's value could be determined to be in the range $00-$ff, it could automatically use QLA instead.
This would be ideal in my view: much like an absolute address can be promoted to a zero-page reference.

Quote:
In any case, I guess the answer to "how hard?" is "not very" at the assembler level.
Great! Thanks for investigating.

One cheesy idea I had for encoding is to use the same opcode, but if the high byte is non-zero then it's an immediate operand. (A test for non-zero over 8 bits is probably not expensive in FPGA - there's one way to find out.)

Of course, that cheesy idea displaces a lot of other clever ideas for packing the opcode space. But it's an idea for an experiment. (One could back off a bit and only allow 4 or 6 bits for the immediate, leaving 2 or 4 for other purposes, like register selection or predication.)

Cheers
Ed

The 6502 has plenty of spare opcodes that can be used for instructions involving other registers, the 65C02 less so. I'd keep the quick parameter as 8-bits (a useful size).

I was tempted to say it should be a signed value -128 thru 127 and sign extended on use but on second thought I'm not sure that its so useful on the on 6502 based processor. It makes more sense on devices that have more register indirect addressing modes.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Aug 29, 2011 10:16 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Good point about signedness - it's a choice that should be made, one way or the other.

Also, I realise that allowing only non-zero quick immediates makes great sense for ADC and a little less sense for LDA. Doesn't hurt for AND or ORA.

Which really shows that a bit more thought is needed.

But to step back a bit:

What these quick immediates are doing is helping code density. (They might in theory help performance but only if someone does the hard work to squeeze out the cycles. Whereas moving from Spartan3 to Spartan6, or figuring out caching, or implementing an on-FPGA block of zero-page, would also help performance in a more general way.)

Whereas, PHX and so on are really convenient for register-preserving subroutines. They save on allocating temporary space or retrieving A from deep in the stack.
Code:
   PHA
   TXA
   PHA
   TSX
   LDA $10002,X

versus
Code:
   PHX

(or even worse to preserve Y)

And BSR, BRA open up position independent code, and stack-relative addressing opens up ... something useful to someone. (That's probably a bigger change to the core though.)

I think these various types of changes are different in quality. The base advance of 65Org16 is the gaining of 32-bit address space (or for a small memory system gaining flat 16-bit zero page.)

The next advances that I find most attractive are the ones which give assembly programmer convenience (so we can write a monitor and applications) or help with high level languages (although that's an odd one because we might as well have used an existing CPU - but this is more interesting!)

Each to his own, of course: if I implement extensions beyond NMOS I'll make them optional, so that different extensions don't collide with one another, and don't have to be compatible. The baseline spec is fixed, and finished. Everything further is an extension, and I'd love to see forks implementing them. (65Org16C would be an obvious and well-defined extension, as would 65Org16CE)

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Aug 29, 2011 10:37 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Quote:
Good point about signedness - it's a choice that should be made, one way or the other.


After posting my brilliant macros I realized I'd made them unsigned, when something I actually would want to do on occasion is:

Code:
QLY #-1


usually as initialization before entering a loop. "QLA #-1" would be likely prepatory to setting some memory location to $FFFF. So I think I'd vote for sign-extension, and have to re-write the macros accordingly (just replacing "ubyte" by "byte" would be simplest, and result in simply dropping any excess bits of whatever value they may be).

Quote:
Also, I realise that allowing only non-zero quick immediates makes great sense for ADC and a little less sense for LDA. Doesn't hurt for AND or ORA.


In the original 6502 instruction set (and all variants, AFAIK), it's possible to code branch instructions with a zero offset, eg., "BPL 0". Useless except perhaps in some esoteric exercise in timing, but possible. I imagine that was the simplest way to implement the instructions, so there it is.

And don't forget that ADC also involves the carry flag. I've seen "ADC #0" deliberately used for that reason.

Mmm, sign-extended quicks could also be used to make ADC perform subtraction, couldn't they? Making "quick subtracts" redundant. If they weren't sign-extended there could be separate quick ADC and SBC instructions, both with a greater operand magnitude.

And another thing: quick adds of 1 or -1 effectively give us INA and DEA, at least when we know the carry flag is clear. If it's set...quick add zero gives us INA, and quick add -2 give us DEA, I think. That might be useful often enough to forego actual INA and DEA instructions.

Are quick AND, OR and EOR worth it? Sign-extended or not, the high octet has a set value that can't be changed. I'm having trouble envisioning where that would be useful enough to justify the expense. Hmm, BIT also falls in this category. Maybe others as well.

What about a quick NOP? If the signed value is added to the program counter, we get a one-byte BRA (and the true SKP instruction I mentioned earlier). The original NOP behavior remains the same with a high octect value of zero. Getting it to always go in two cycles might be challenge. I wouldn't know where to begin with that, except to point out that there would be no need to fetch a second operand byte (does that help?).

So my initial "quick" list would have sign-extended operands and opcodes for:

QLA or LAQ
QLX or LXQ
QLY or LYQ
QAD or ADQ
QBR or BRQ or SKP or NOP

More as I think of them...

Quote:
Whereas, PHX and so on are really convenient for register-preserving subroutines


I'm all for these. Extremely handy!

Quote:
stack-relative addressing opens up ... something useful to someone. (That's probably a bigger change to the core though.)


I've had some vague thoughts about this sort of thing. It might help with position independence of programs larger than the 32K range of a branch instruction (other approaches: "chained" BRA instructions to reach beyond that range or Real Operating System Loaders).

I've never seen code like this (on a 65C02 or above):

Code:
TSX
JMP ($101,X)


probably because for that to be of any use a correct address has to loaded onto the stack first. Which finally got me to thinking what the 65816's PEA, PEI and PER instructions might actually be useful for. Yeah, have the cpu do the full-address calculations for you at run time...but that's about as far as I've gotten in puzzling it out.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: