6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Nov 13, 2024 7:15 am

All times are UTC




Post new topic Reply to topic  [ 15 posts ] 
Author Message
PostPosted: Tue May 07, 2024 5:51 pm 
Offline

Joined: Sat Oct 28, 2023 7:57 pm
Posts: 22
Location: Missouri
Hey all, I was thinking about the programming model for the 6502, and had an (almost certainly overly-naive) idea for an approach to making a 16-bit variant with minimal changes to the actual function of the chip.

Basically, take the 65C02 and make the following changes:
  • Address Bus and Program Counter expand from 16->24 bits
  • The data bus, ALU, and all registers expand from 8->16 bits
  • Opcodes remain unchanged, with bits 9-16 being set to "0", with the following exceptions:
    • in ZP Address mode bits 9-16 form the ZP address
    • in Absolute Address mode, bits 9-16 form the LSB of the address, with the rest loaded by the next 16-bit word

Perks:
  • I would think it would be relatively simple to implement on top of the design for a standard 6502
  • 64KB stack space, instead of 256 bytes.
  • Similar mental programming model to 6502
  • 16MB address space without circuitry hassle of the 65816
  • Many instructions should run with one fewer clock cycle, I believe, due to compressed memory lookup

I'm sure there would probably be issues with this approach I'm not aware of or thinking of, but I'd love to hear them to improve my understanding of the 6502 (and maybe make this concept something worth implementing someday when my FPGA skills are improved some)


Top
 Profile  
Reply with quote  
PostPosted: Tue May 07, 2024 6:45 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 287
Location: South Africa
WCMiller wrote:
Perks:
  • 16MB address space without circuitry hassle of the 65816
I'm not sure how far you want to take this or how seriously you're thinking about implementing this in an FPGA but... here goes!

As you're using a 16bit data bus I'd say use the '816 as a base rather than the '02 but do away with the bank / data multiplexing. And then do away with the need for switching between 16bit and 8bit index / memory widths. 16bit opcodes have more than enough options to encode the width in the opcode. And the '816 has so many useful instructions that even the 65C02 doesn't. Specifically I want the movable Direct Page (previously Zero Page) that it provides. It is so, so very useful that I really struggle to go back to the '02.

More so, with the 16bit opcode availability why not let any register provide the Direct Page offset? And as I'm just throwing out wild ideas how about letting any register be used as stack pointers (for programmatic pushes and pops)?

And then why only allow arithmetic on the accumulator? And whilst we're at it I'd really like two's complement arithmetic (with a few caveats).

But I think you see where this is going. Because I have no real restrictions other than "Wheeee! this would be cool!" it's very easy for me suggest things that are well outside of what you are thinking of or would want to do.

So to bring it back to earth. A fully 16bit address bus and ALU does already sound very cool. Possibly my only suggestion would be extend the internal registers to 32bits and treat the entire 24bits of address space as single non-segmented flat memory. That honestly just makes programming much easier.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 07, 2024 7:01 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Just for ref, we've visited ideas like this many times in the past, and while there's always an interesting new idea, it's worth looking over what's happened in the past, in my view:
Index of threads for improved 6502 and derived architectures

That's not a definitive index, of course - it's quite old now, for one thing.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 07, 2024 7:06 pm 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
This sounds like a similar starting point to my 65020 design, but heading in a different direction with different goals. Mine was a reaction to the 65816, trying to imagine how the 6502 could have made the leap out of the 8 bit era without turning so ugly.

With registers extending to 16 bits, you either accept that it's not going to be compatible, or you need some way of requesting 8 bit operations. The 65816's solution was mode bits, mine was bits in the top half of the opcode.

With zero 'page' being 64K, there's no need for a direct page register. Code does indeed end up a little bit smaller (if you're counting memory locations rather than bits) and faster.

I went a lot further than you're planning, with more (and wider) registers, and operations between registers without going to memory. The result no longer feels like a 6502, but it gives me the same kind of joy that the 6502 did.

There will be a lot of details that you'll need to work out, but I think you'll find it worth the effort.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 07, 2024 7:28 pm 
Offline

Joined: Sat Oct 28, 2023 7:57 pm
Posts: 22
Location: Missouri
@AndrewP Right now, this is more in the "hey, this might be fun" stage than any actual plans (I'll need to do things like learn programming FPGAs, for one!), but eventually I think it'd be fun to make this an actual thing (or at least an emulated thing). I actually had some thoughts for an 816 variant of this idea (with a 32 bit ALU/Registers/Address Bus), as well as a balls-to-the-wall 6502-derivative with all sorts of things added. With this one, however, I'm trying to keep the changes as minimal as possible and as similar to the 6502 as I can, conceptually. I do like the idea of adding a register that provides for a movable direct page; I'll have to keep that in mind.

@BigEd I'd actually seen that thread a while back but forgot about it. It'll definitely make for good reading!

@John West I figured one easy way I could make things compatible is to set it to ignore the upper 8 bits of each word (with some complications for things like the carry bits). As I said, the opcodes would be identical to a regular 6502 (except if I add any, which would also be 8-bits long). The zero page would still be 256 bytes, so a direct page register might be nice.

Just to be clear, as I fear it wasn't, the least significant 8 bits of an address is loaded with the opcode, so, for example
Code:
.org $800000
stx $123456 ; store-x in absolute address mode

would be assembled to (I believe)
Code:
$800000 $8E $56
$800001 $34 $12

and
Code:
.org $800000
stx $12; store-x in zero-page address mode

would be assembled to
Code:
$800000 $86 $12


That's why it only has a 24-bit address bus; 16 from an argument and 8 that hitch along after the opcode.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2024 9:00 am 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 287
Location: South Africa
WCMiller wrote:
That's why it only has a 24-bit address bus; 16 from an argument and 8 that hitch along after the opcode.
Sorry, I missed that. Kinda. I think I saw Zero Page address included in (16bit) op-code and thought: "Trying to add the Direct Page register to the zero page address before the cycle ends so it can be used as an actual address on the address bus is way to complicated for me". And stopped thinking. Which was wrong because I was still assuming a movable Direct Page and never read the bit where absolute addressing would also have 8bits of the address included in the op-code.

That makes for much more efficient memory access than I had realised. i.e. a direct Zero Page 16bit ADC could be done in two cycles compared to the 4 or 5 it takes the '816. Nice!

Like John, I'm still wondering how to deal with 8bit data / operations without resorting to instructions to set the memory width state. But with that said (on the '816) I rarely change to 8bit memory unless I'm dealing with data that has to be processed 8bits at a time. Think pixels or text. And I basically never switch the Index registers to 8bit so possibly that mode could just be entirely ignored.

I do want to harp on about why a movable Direct Page is so useful for a bit more.

I think it's fairly typical to view Zero Page as processor registers that just happen to live outside the processor. And that means, like any processor, when doing a function call the state of those registers may need to be saved. On the 6502 with its fairly small call depths I think that's generally done by giving each function its own section of Zero Page and assuming no recursion and that two functions won't stomp on each others state.

But as programs get bigger, think a full 24bits of memory, it's going to become harder and harder to ensure each function plays nicely only in its own tiny piece of Zero Page.

An obvious solution is to save the Zero Page 'registers' onto the Stack; and then restore them when the function completes. But that's a lot of pushing and popping.

A far nicer solution is have a movable Direct Page and slide it down memory as functions are called. This implicitly saves the state of the calling function because its Direct Page address is no longer accessible because the Direct Page has been moved on. And implicitly restores the state of the calling function when its Direct Page is put back where it was when the called function returns. As a bit of an '816 aside: because the '816 has limited Stack addressing I'll push function arguments onto the stack and then set Direct Page to point to the Stack and address those parameters from the Direct Page instead. Very useful and it allows the stack to continue to be used as a stack.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2024 9:40 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 962
Location: Potsdam, DE
Does this let you use a stack frame? So you would define variables on your (much larger) stack so that input variables remain at a fixed offset above the current stack pointer, and local variables appear below the stack pointer?

With a 6502 it's messy: transfer the stack pointer to X, and then offset from 102,x 104,x etc and then some arithmetic to sort out any variables you've eaten when you return - it really needs the calling routine to tidy up the stack and it's slow... I suppose it could work the same way but with a 16 bit X register?

8086 has a base pointer which you set on entry to your subroutine, and can also return and eat n stack entries in one instruction... that's very useful.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2024 9:45 am 
Offline

Joined: Tue Sep 03, 2002 12:58 pm
Posts: 336
I missed that too. So zero page will only be 256 bytes, and that does make the ability to move it useful. But packing the address into the opcode will make a big difference to the speed.

It's an interesting idea, and I think one worth pursuing. My recommendation is to start with a software simulator, particularly if you've never worked with FPGAs before. It's a lot easier to experiment with changes to the instruction set there.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2024 11:23 am 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 287
Location: South Africa
barnacle wrote:
Does this let you use a stack frame? So you would define variables on your (much larger) stack so that input variables remain at a fixed offset above the current stack pointer, and local variables appear below the stack pointer?
Yup, that's exactly using the Direct Page as a stack frame. Which (if my memory works - but it's been a while) is very similar to how the 16bit x86 BP register was used.

I guess it's convention but I would setup everything to appear above the stack pointer in memory (i.e. inside the stack). So my Direct Page calculation would be:

on entering a function from a JSR / JSL
transfer the Stack Pointer address to A
subtract the amount of memory I need for local variables
transfer A back to the Stack Pointer

and then

push the Direct Page (so I know what it was previously)
transfer A to the Direct Page

The assembly would look something like this:
Code:
FunctionThatDoesStuff:
;preamble
   TSC
   SEC
   SBC   #_Local_Variable_Space_needed
   TCS
   PHD
   TCD


Having all local variables and arguments inside the stack means the stack can still be used. Specifically for registers that can only be accessed via the stack: PHB, PLB, PHP, PLP and PHK

For completeness cleaning up and returning from the function is a bit more complicated because the return address for RTS / RTL needs to be the last thing popped off the stack (but unfortunately it sits after any arguments that were pushed)
Assuming a long return:

load A with return address (low and high bytes)
store A over the (first+1) bytes of argument
load A with return address (high and bank bytes)
store A over the first bytes of argument

then

pull the Direct Page
transfer the Stack Pointer address to A
add the amount of memory needed for local variables less 3 bytes
transfer A back to the Stack Pointer
return long

Again the assembly would look like
Code:
;postamble
   LDA   <_Local_Variable_Space_needed+2            ;RTL hi, RTL lo
   STA   <_Arguments-1
   LDA   <_Local_Variable_Space_needed+1            ;RTL b, rtl hi
   STA   <_Arguments-2
   PLD
   TSC
   CLC
   ADC   #_Arguments-3
   TCS
   RTL


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2024 1:00 pm 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 962
Location: Potsdam, DE
Hmm, I guess your calling routine must leave space on the stack for a return value before pushing the input parameters. Then it makes the call, pushing the return address. Any local variables are held on the stack below the return value, and on return, the return value is placed in the reserved place; the return address can either be moved to the immediate entry below the return value, or the calling routine can adjust the stack to jump up over the input parameters.

Keeping everything local below the return means the same approach can be used for any calls from the routine, and means that interrupts also work properly - that just looks like a normal stack.

Perhaps the call routine could automatically load a base pointer register? Or a TSPBP instruction.

Neil


Top
 Profile  
Reply with quote  
PostPosted: Thu May 09, 2024 12:30 am 
Offline

Joined: Sat Oct 28, 2023 7:57 pm
Posts: 22
Location: Missouri
@AndrewP, @John West, thanks for the... I guess validation of the "interesting-ness" of the opcode+memory mixing idea; it was the only part of this that I think actually had the potential to actually be "clever" and worth investigating and I'm encouraged that you two agreed! I'm really intrigued by the way the conversation is developing (mainly because I'm having a bit of trouble following all the nuances of how the conversation is unfolding, and one thing I wanted to do was push my understanding to help find areas of ignorance), but I don't know how productive I'll be contributing. It's interesting reading, though, trying to suss out what everything means and the implications as to why they're being discussed.


Top
 Profile  
Reply with quote  
PostPosted: Thu May 09, 2024 3:05 am 
Offline

Joined: Wed Jan 03, 2007 3:53 pm
Posts: 64
Location: Sunny So Cal
Quote:
Yup, that's exactly using the Direct Page as a stack frame.


You might want to look at the TMS 9900's Workspace Pointer for a similar notion, which can be anywhere in RAM.

_________________
Machine room: http://www.floodgap.com/etc/machines.html


Top
 Profile  
Reply with quote  
PostPosted: Fri May 10, 2024 3:23 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 410
Location: Minnesota
Just spitballing here, and not accounting for the direct page register yet, I'm curious why on the 65816 the return address needs to be worried about. It has to be accounted for, but the processor doesn't care where it comes from:

Code:
   pea   #retfnc-1    ; where to come back to
   psh   arg0         ; argument(s)
   psh   arg1
   ...                ; (continue)
   psh   argN
   psh   #argsiz      ; size of arguments pushed to function
   psh   #locsiz      ; local space used by function
   jsr    function    ; call it
retfnc   ...          ; (continue)

   tsx                ; pushed arguments + sizes + return address
   txa
   clc
   adc   3,s          ; add local space
   tax                ; reset stack pointer
   txs

   adc   4,s          ; total space used by args + locals
   pha                ; save it

   ...                ; (do function)

   tsx                ; get the stack pointer
   txa
   clc
   adc   1,s          ; add space to recover
   adc   #ignore      ; ignore the return address and the sizes
   tax
   txs                ; now points to just below 'retfnc' address on stack
   rts


This could probably even be adapted to the 6502, as long as there's enough stack space. I believe it's interrupt safe. It makes local space acquisition and the release of all stack memory used the responsibility of the callee rather than the caller, which should reduce overall memory use.

Edit: oops. Forgot that resetting the stack pointer makes that 'adc 4, s' invalid. Better move it up to just after 'adc 3,s' gets transferred to the X-register but before X gets transferred to S.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 10, 2024 5:34 pm 
Offline
User avatar

Joined: Mon Aug 30, 2021 11:52 am
Posts: 287
Location: South Africa
WCMiller wrote:
@AndrewP, @John West, thanks for the... I guess validation of the "interesting-ness" of the opcode+memory mixing idea; it was the only part of this that I think actually had the potential to actually be "clever" and worth investigating and I'm encouraged that you two agreed!
Thanks, I was definitely intrigued by it. I'm not a hardware guy so maybe it already exists but I haven't ever seen anything that mixes part of the address in with the op-code. Could be very fast* :D

It was mentioned higher up but writing an emulator in software first would be the path that I would take too. That gives a lot of room to experiment. Like, what if instead of 8bits of address you reduced it to 6bits? Sure that's only 4MB of addressable space and 64 Zero Page locations but ... maybe that's enough? It may well be that a 1024 op-codes makes it worth it.

But the flip side is that starting with the (very well) understood 6502 and extending to a 16bit bus exactly as you first described would be the quickest path to getting something testable. Ultimately - of course - it's your time; easy for me to throw ideas out when I don't have to do the work

teamtempest wrote:
Just spitballing here, and not accounting for the direct page register yet, I'm curious why on the 65816 the return address needs to be worried about. It has to be accounted for, but the processor doesn't care where it comes from
As is absolutely the norm here we've taken this thread waaayyy down the garden path 8) I really must post up a dedicated thread with pictures and such; and there's lots of good comments here I want to respond to.

* A part of 'could be fast' is that I see 50Mhz clock speeds as a hard'ish limit in the discreet component hobby world. That's typically the best an SRAM can do: assuming 10ns of CPU stuff and then 10ns of waiting for the SRAM to read. The fewer cycles you need the more work you can do in any given time.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 13, 2024 2:00 am 
Offline

Joined: Sat Oct 28, 2023 7:57 pm
Posts: 22
Location: Missouri
I definitely think writing an emulator would be the best first step. I might modify an existing 6502 one the get a minimal viable product (the typescript one listed in the emulation channel might be a good start for me, as it ties into my actual programming skill set).


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: