6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 3:50 am

All times are UTC




Post new topic Reply to topic  [ 186 posts ]  Go to page Previous  1 ... 4, 5, 6, 7, 8, 9, 10 ... 13  Next
Author Message
 Post subject:
PostPosted: Mon Jul 27, 2009 8:29 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
GARTHWILSON wrote:
.
Quote:
If it turns out that 65816's B register is never used, this is low value. I wasn't thinking of this small onchip stack as being at all related to the SP stack.

You've mentioned this before. When I see "SP", I think of "stack pointer," but you seem to mean something different.

I did mean stack pointer. I was trying to say that a B (and maybe C) register which operates like an eval stack isn't also meant to be shadowing what's happening on the PHA/PLA stack. But I've gone off this auto-push idea a bit.

GARTHWILSON wrote:
Quote:
I don't see the 65816 as helping the implementation of 65Org32 - it's interesting and it's useful, but lots of other processors are also good for ideas. We do have some 6502 implementations to use as starting points: if we're very far removed from 6502 then we have to start afresh

The 65816 has a lot of things the 6502 should have had ... It shouldn't be ignored.

Indeed, it is a source of ideas. But there is no open source verilog or VHDL for it, so it isn't any kind of starting point for an implementation.

I fired up Xilinx's free tools the other day and was able to read in Rob Finch's verilog 6502 fairly easily. It synthesised OK to gates, with a few warnings one might want to fix up, but it didn't place and route - probably something wrong with my installation. It only took a couple of hours to get that far. So if we can start with a good quality 6502, there's enormously less work than if starting from scratch.

Quote:
The one with the 32-bit data bus (65Org32) will need 84, so not a big difference there.

There might be some use in a version which only brought out 24 or 28 address bits?

Quote:
for use with 8-bit I/O devices, we might still want an 8-bit LDA that not only ANDs what is read with $000000FF but also transfers bit 7 (instead of bit 31) to the N flag, and an 8-bit BIT instruction that additionally transfers bit 6 (instead of bit 30) to the V flag.

I hadn't thought of these flags. Mask is AND with $FF and sign-extend is compare and SUB $100. If a LDA is followed by those ops, does that sort out the flags? It costs almost nothing, because the LDA has to slow down to 1 or 2MHz whereas the other processing could happen at full speed.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jul 27, 2009 8:36 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
just realised: both 65Org16 and 65Org32 are intended to have loads of RAM and not to conserve it.

so using lookup tables for some purposes starts to make sense: even as large as indexed by 16bits. That could make for a rapid multiply, without any hardware support.

this idea inspired by Rob Finch's previous thread on enhancing 6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jul 27, 2009 9:12 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
Ed, I have formed the .hex files for a couple of megabytes of look-up tables [edit, June 2012, now available here with supporting info] to put in ROM to use on an '816 computer I never made, for getting 16-bit trig, log, square-root, squaring, and bit-reversing functions far faster than it could calculate them, and for multiplication and an inversion table (16-bit input, 32-bit results) so that instead of dividing, you could multiply by the reciprocal. I should give them to Mike to put on this website. Most tables are 128KB each, and at least a couple of them are 256KB each. Extending them all for 32-bit input and results is out of the question, but tables could still shorten the process of calculating various functions even if the tables don't take you directly to the completed answer.

Most users of the 65Org32 probably don't need 32 bits of address bus brought out, but I personally definitely want more than 24.

Quote:
But there is no open source verilog or VHDL for it, so it isn't any kind of starting point for an implementation.

Ah, I see where you're coming from. Is there any available for the 65c02 though, or just the NMOS one? I don't remember ever coming across the CMOS HDL code.

Quote:
because the LDA has to slow down to 1 or 2MHz whereas the other processing could happen at full speed.

If something is that slow, I prefer to interface to it through a 65c22 or something that can handle the normal bus speed, maybe 16MHz. The Saronix RTC58321 four-bit real-time clock I used one time was only good for a couple hundred kHz IIRC, and the character LCD modules are only good for about a MHz, so they go on a 65c22.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Jul 27, 2009 9:57 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
GARTHWILSON wrote:
big tables: 256KB

That's the idea - but only 64KB if you have big bytes! For multiply I'd imagine coding a long multiply on 16-bit digits using a table for low and high result. For square root and transcendentals one might use polynomials (so everything depends on multiply). For divide and squareroot I think the favoured approach is an iterative converging algorithm which uses a table to get a starting point.

As for the exact flavour of the 6502s out there, I'm not sure. The 2002 one from Bird Computer (Rob Finch) says it is an original. So, if it was important to have the 65c02 enhancements, they'd have to be written (and tested!) But the T65 one may be what you'd want: my notes say it includes a 65816. If so, and if it's good enough quality, and fits on the device OK, it does mean the '816 is a reasonable starting point after all.

You're quite right that if you've got a fast peripheral then my argument that sign-extending in software is no time cost fails. I still wouldn't worry about this though: using a macro makes it easy, code density is no concern, and if I/O performance matters then hardware can do it. (One might allow for an input pin which let the peripheral signal a byte-wide access, and the memory interface can do the sign extension. No change to the cpu core itself.)

Fast peripherals are also handy if they remove all need for changing clock speeds.

You had an earlier idea about bootstrapping by different means to avoid lots of ROM: I like that. The 816 project I'm involved in hopes to use a serial EEPROM for this.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Jul 30, 2009 5:25 pm 
Offline

Joined: Thu Jul 26, 2007 4:46 pm
Posts: 105
I should point out that doing bytewide/shortwide access on a 32-bit bus requires only a single multiplexer, which has 7 possible values. You're inevitably going to have quite a lot of muxes in your design anyway.

I should also point out that, despite what it may sound like, pipelining isn't difficult. The most complex part is the interlocks - of course, you can not implement these and have the assembler error if you try and violate the require delays.

As for branches, I have 21-bit PC relative branches (+-1MB - I can't ignore the lower address bits as I have a "dense mode" in which instructions each take 16-bits but only have two operands - and it's enabled by the LSB of the address). If you need to branch further than that, I imagine the linker will replace it by an indexed jump (Something like JMP register, index where the register points to a table). Indexed jumps will of course be slower due to pipeline delays.

Of course indexed jumps have the issue of you overflowing the index - but I have a feeling that programs very rarely require 32k far jump locations - as generating that many probably involves equally many functions, unless you have over 250k instructions per function!.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Jul 30, 2009 6:01 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
I agree that the wiring for unaligned or bytewide accesses isn't a major problem, but you need to assign opcodes, design and debug the decoder (which means writing lots of test cases) and then extend the assembler.

It's all possible. But a 6502 takes 1500-2000 lines of HDL, and the further you go, the more to design, test, debug and support. Which is why I advocate a series of small steps. If you have the hardware design skills to produce lots of tested code quickly, that's great - but I don't.

I wouldn't go so far as to say pipelining is easy: it's easy on paper - even easier on a whiteboard - but remember you need to get from an idea to a documented and debugged implementation. I'd imagine it would double the amount of work. edit: and why not get something working first, then get it working fast?

I see there are 6000 lines of test code in Mike's py65.

Sorry if that seems very negative. The start of this thread was about ideas for improving the 6502, but it became a thread about implementing an improved 6502, which is hard. On the positive side, several forum members have in fact done it (at least Ruud, John West, Rob Finch, Sprow: apologies to others I've missed)


Last edited by BigEd on Thu Jul 30, 2009 7:03 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Jul 30, 2009 6:23 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
To be a bit more positive: if you speak python, how about adding support for your proposed modes/operations/registers into py65 and seeing how it works out? Or if you speak C, add it to run6502 or lib65816. Or if you speak HDL, well...!

edit: must resolve to be more positive and encouraging! I had a quick look to see how hard this stuff is supposed to be. This 3rd year undergraduate course www.ee.ryerson.ca/~courses/coe608 has 40 hours of instruction and estimates 24 hours of lab time. About a quarter of the instruction is on pipelining-type stuff (technical term!). So if your brain is half the age of mine, you could have this done before the clocks change.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Jul 31, 2009 7:05 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
The next step really does seem to be finding the HDL designers. I'm just a little surprised that we haven't heard from any yet. I know of several on these forums who have been involved in processor design at a hobby level or higher. Maybe I/we need to make more of an effort to contact them and alert them to what we would like to do. They won't be pros, but we're not looking for a very complex processor. After one or more can get involved, we can find out from them what can be done with the resources available (which includes their level of capability), and see if it's feasible to have things like the branch-distance operand integrated with the op code in a single 32-bit byte. (Yes, I agree that 21 bits is usually way more than enough for relative branch distances. The exception would be if you have a big block of data that you need to skip over to get to the target code on the other side.) I'm hoping that having an input clock that's four or eight times the bus frequency will give more capability without deep pipelining.

I don't intend to wire-wrap this thing, but rather lay out a multilayer board and get it made, and offer it to others at my cost, as Daryl has done. I don't mind sending the files to others who may want to make small changes and get their own version made. If/when we get to where it's time to start the testing and verification, I would want to design a computer around it that could be used for that and possibly still be useful beyond that-- if that's even the appropriate approach. Without ever having done it before, I suspect that a board suitable for verification would have test points for connecting an oscilloscope and maybe a logic analyzer, external input or clock circuit for single-cycling or running at a variable speed, and probably other such things.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Aug 02, 2009 7:58 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1041
Location: near Heidelberg, Germany
As a side note that came to my mind when I read about the pipelining:

The original 6502 (and even 65c02 and IIRC up to 65816) can not "roll back" an opcode in case of an error (say bus error like non-existing memory, or write to a read-only page).
So for example a read-only memory can not trigger an interrupt (rolling back the current opcode), in the interrupt fix the read-only problem, and restart the opcode that then succeeds.

Implementing support for this would require implementing a set of shadow registers, that are updated with the new values when the opcode "finishes" (whatever that means in the context).

In my CS/A design I actually used a second (auxiliary) 6502 to take over the bus in these occasions while the main 6502 was halted using RDY.

So I'd like to see such a feature in the new CPU if possible.

André

P.S.: bus error condition inputs wouldn't be bad either :-)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Aug 04, 2009 9:05 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
André, not to doubt the validity of what you're saying, but can you elaborate on the usefulness of the rollback and the bus-error input, and what problems they would solve that can't be solved with good programming? If an error in my code makes the processor try to write to ROM, there's no hardware damage, but I'll fix the bug to make it work right anyway.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Aug 04, 2009 9:43 pm 
Offline

Joined: Thu Jul 26, 2007 4:46 pm
Posts: 105
It's more a matter for systems with an MMU. For example, a page may be mapped copy-on-write; the bus error input would be asserted to roll-back the instruction and cause the bus error exception handler to be branched to, which would then copy the page, update one of the page tables to point to the new page, and mark it writable in both process' address space. The processor then returns from the exception handler which restarts the instruction.

In the case of a pipelined architecture, errors cause the instruction, plus any in any lower pipeline stages, to be nullified (I.E. they will not cause any processor state changes) and emit a signal which indicates to the control unit that an exception has occurred. In the case of multiple stages firing exceptions simultaneously, the highest stage (I.E. first instruction in sequence) has priority.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Aug 04, 2009 10:16 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
Well, I may need some more education, but it sounds like one of the reasons some of the more complex processors are simply horrid in interrupt performance compared to the 6502.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Aug 04, 2009 10:21 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Does the 65816's ABORTB pin do this job? It says that the instruction completes but without any register updates, then the machine takes an interrupt. So the interrupt handler can fixup and restart the instruction.

I'm not sure how hard that is to implement but I see the timing on the 816 demands that it must be valid before the rising edge of phi2 - that must be a clue.

The nice thing about an FPGA implementation is that the memory decode can be on-chip and therefore more easily fast enough to respond within half a cycle.

Similarly, it seems to me that a memory interface on chip could handle niceties like prefetching and aligning (and sign extending), if necessary using RDY to hold up the core. It still has to handle reads of locations not yet written back, and handle uncacheable writes to peripherals, so it interacts with the memory decode, but maybe it's a distinct unit separate from the core.

edit: fixup sense of ABORTB and RDY


Top
 Profile  
Reply with quote  
 Post subject: 6516 (which never was)
PostPosted: Fri Aug 07, 2009 8:02 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Relevant to the subject of incremental enhancements, a historical note:

The 6516 has been mentioned in a previous thread. It was never made, but was floated by Synertek as an enhanced 6502.

There's an article about it (from 1980) on page 36 in issue 23 of Micro
http://www.6502.org/documents/publications/micro/
and some reader comments on page 62 of issue 26.


Top
 Profile  
Reply with quote  
 Post subject: Implementation: FPGAs
PostPosted: Fri Aug 07, 2009 8:16 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
GARTHWILSON wrote:
I don't intend to wire-wrap this thing...

On the subject of implementations: I have a fondness for the unreasonable idea of putting something 6502-like on a CPLD, but if you take a practical view and choose an FPGA, you'll find capacity is not a problem at all. They are all surface mount though, and the higher capacities are not in socket-friendly packages.

But it turns out that there are suppliers of DIP-mounted FPGA boards, which promise to make breadboardable design possible.

For example, OHO have a Spartan-3 XC3S200 plus 512kx8 SRAM on a small module 23,5x47mm with a 24 pin DIL plug. (EUR 70)

Also Enterpoint have a series of 100k gate or 500k gate FPGA on DIL28 to DIL40 with 5v tolerant I/Os (GBP40 to GBP60) and by way of illustration there's an 8088 replacement available on such a board.

So for pincount up to 40 there's a very usable way forward.

There's a whole set of interesting modules at Trenz

(All these happen to be UK or Germany, but I imagine similar things can be found elsewhere or these ones shipped.)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 186 posts ]  Go to page Previous  1 ... 4, 5, 6, 7, 8, 9, 10 ... 13  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron