6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 26, 2024 9:54 pm

All times are UTC




Post new topic Reply to topic  [ 98 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next
Author Message
PostPosted: Mon May 02, 2011 7:39 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Some of you may recall a very long thread which threw up at least a couple of suggestions concrete enough to be named: 65Org16 and 65Org32. (Other ideas for enhanced 6502 are works in progress: 1, 2)

The idea with 65Org16 was that it was the least-effort method of getting a lot of memory easily available on a 6502-like CPU. It wasn't meant to be fast or to have a lot of extra capabilities or denser code.

Well, I finally did some hacking this afternoon, using Arlet's core as a basis, and it does seem to be relatively straightforward to cook up a version of 6502 with 16-bit registers and a 32-bit address space. The way to look at this is as a 6502 where a byte has 16 bits. There is no particular support for an 8-bit data type or peripheral - you roll your own.

This is of course very early - not even proof of concept.

(Edit: this project now on github as a friendly fork of Arlet's project. See also this summary of 65Org16 threads.)

Anyway, here's a trace: the cpu is running from a tiny ROM, hand-assembled into 16-bit opcodes. Mainly a question of zero-padding with a sprinkling of sign-extending. A usable assembler would be one of the things needed.

Attachment:
File comment: Traces from a simulations - originally http://img641.imageshack.us/img641/4797/65org161.png
65Org16.1.png
65Org16.1.png [ 33.56 KiB | Viewed 4605 times ]


Question is: who's genuinely interested in a 6502-like CPU, implemented on FPGA, which can address up to 4 gigawords of 16-bit wide memory?

(And who can't wait for André or Ruud to do the job properly... actually they could easily get there before I do as it's taken me a year to do an afternoon's work.)

(A bit of trickery with external latches and you could have 24 bits of address from a 40-pin package: 16 megawords of 16-bit wide memory.)

Cheers
Ed

ps. Big thanks to Arlet for the basis of course. T65 would probably have served but I prefer to work with verilog.


Last edited by BigEd on Sat Oct 22, 2016 9:39 am, edited 3 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon May 02, 2011 7:53 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
BigEd wrote:
Question is: who's genuinely interested in a 6502-like CPU, implemented on FPGA, which can address up to 4 gigawords of 16-bit wide memory?

(And who can't wait for André or Ruud to do the job properly... actually they could easily get there before I do as it's taken me a year to do an afternoon's work.)


Great work!

Although I'm not sure if I could get there before you, my non-6502 schedule is also quite full... But I'll top your 8GByte with 2^64 byte ;-)

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon May 02, 2011 8:01 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Thanks! I can in no way compete with a properly architected and specified CPU!

(It's been bugging me that 65Org16 ought to be really easy to do - of course things are never quite that easy. Most of these discussions go off into wish lists and pipe dreams. Yours is clearly an exception, and by taking control you avoid a committee effect. Mine is another approach: do as little as is possible.)

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Mon May 02, 2011 11:10 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
.... A usable assembler would be one of the things needed....

A usable assembler/disassembler/monitor will be our biggest "stumbling block"...

BigEd wrote:
...Question is: who's genuinely interested in a 6502-like CPU, implemented on FPGA, which can address up to 4 gigawords of 16-bit wide memory?...


I for one am very interested.
16-bits is the next step for this 65X- family. Should've happened along time ago...
An easy way to test ISim in areal world project is like what I am doing in the 6502SoC. Have a video output device, read your character data from a memory, be it internal ROM, external Flash, external preloaded RAM from internal ROM (or vice versa), etc., and send it to the display. WYSIWYG(What You See Is What You've Got).

BigEd wrote:
...ps. Big thanks to Arlet for the basis of course. T65 would probably have served but I prefer to work with verilog.

I agree, and I believe Arlet's Core is THE fastest...
Maybe, in the next few days, I will put together a test scenario of a simple internal ROM that clears a 640x480 display, uses the 512 byte ZeroPage/Stack internal RAM, and test all of the 6502 Cores, mentioned in the 6502 Core comparison thread, using a XC6SLX9 144-pin QFP Spartan 6!

And yes I do still have all the original Cores from that thread safely burned to CD. These will undergo the test.
Some of the original HDL cores are no longer present on their site's...

It will be all ISim as I don't own any Spartan 6's yet.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 12:38 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Do you mean a 65016.org website will be starting up soon?

Also, [16:0] IR's?!

Xilinx parts are ready to do a 2 cycle 16x16 multiply and 32 bit result.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 6:13 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
If I can't get my 65Org32, I'm definitely interested in the 65Org16. For an assembler, the Cross-32 assembler from Universal Cross Assemblers (which may have a new name) allows making the tables for any new microprocessor you come up with, so you can get a nice full-featured macro assembler without having to write it from scratch.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 10:14 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Just a tip for simulation: if you define 'SIM' (or remove the `ifdef), you can add A, X, Y, and S registers separately. That makes it easier to see what's going on.

Code:
`ifdef SIM
wire [7:0]   A = AXYS[SEL_A];           // Accumulator
wire [7:0]   X = AXYS[SEL_X];           // X register
wire [7:0]   Y = AXYS[SEL_Y];           // Y register
wire [7:0]   S = AXYS[SEL_S];           // Stack pointer
`endif


Since you now have 16 bit instructions, it would be nice to add some more registers. Since the core is already using a register file, with enough resources to support 16 registers (only 4 are used right now), you can easily add 12 more without much impact on core size/speed.

The 16 bit ALU is certainly going to slow down max speed, due to the longer ripple carry path. If you care about speed, removing BCD support should help a bit there.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 7:57 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Let me take these points in order:
ElEctric_EyE wrote:
A usable assembler/disassembler/monitor will be our biggest "stumbling block"...

This should be next up, perhaps after I do a bit of a tidy up and check in some source code. I found a BSD-licensed python-based table-driven assembler by Michael McMartin which might be a good basis. Or, there's a small simple one by David Beazley in python3 (but no macros). Python is easy, powerful, productive and cross-platform, and I want to know it better, so it's a language of choice for me.

That is, it gives us an assembler. Possibly Mike's py65 is a good basis for an emulator which includes some assembler/disassembler/monitor capability, but not on the real machine.

As for a monitor, first step is a loader. I'll only take one step at a time, maximum!

(But, the point is that this is a minimal change from 6502. So porting an existing monitor shouldn't be too difficult.)

ElEctric_EyE wrote:
An easy way to test ISim in a real world project is like what I am doing in the 6502SoC. Have a video output device, read your character data from a memory, be it internal ROM, external Flash, external preloaded RAM from internal ROM (or vice versa), etc.

My next step on the hardware side, probably, is much much smaller: some serial I/O so I can talk to a host. I need a minimal ROM, and some minimal block RAM - which can be the same thing, because a block RAM can have initial contents. Nothing off-FPGA. No video. Something like the micro-uk101, but probably i2c rather than rs232, because I've got easier ways of dealing with that.

ElEctric_EyE wrote:
Do you mean a 65016.org website will be starting up soon?

Nope. But keep an eye on my fork of Arlet's core - that's where I'll be working. In usual git fashion, anyone else can make a fork and make changes, and we can adopt each other's changes as we see fit.

ElEctric_EyE wrote:
Also, [16:0] IR's?!
Almost - [15:0] - but I don't intend to do anything with the extra 8 bits until I've got something working usefully. Anyone else can, of course. One might draw on André's opcode map, or on other extensions, or make up one's own.

ElEctric_EyE wrote:
Xilinx parts are ready to do a 2 cycle 16x16 multiply and 32 bit result.
Indeed, and multiply is an interesting case, but it's not interesting until we have a CPU that works, in a system, with some tools and some software. That's a long way off.

ElEctric_EyE wrote:
I for one am very interested.


Great!

GARTHWILSON wrote:
If I can't get my 65Org32, I'm definitely interested in the 65Org16.


Great! The 65Org32 might come later - I can begin to see how it might follow on - but for me at least the 65Org16 is a first step.

GARTHWILSON wrote:
For an assembler, the Cross-32 assembler ...full-featured macro assembler without having to write it from scratch.
Thanks for the pointer - it's a good goal. But as noted above, I've found some other starting points. I think some software people would find an assembler not too hard, but I'm going to have to stand on someone's shoulders. Note that a table approach gets interesting when the opcode is 16 or even 32 bits. But as a starting point I only have the original NMOS opcodes - the interesting bit is handling 16-bit immediates and branch distances, and 32-bit addresses.


Arlet wrote:
... tip to show registers separately in simulation...
Thanks!

Arlet wrote:
Since you now have 16 bit instructions, it would be nice to add some more registers. Since the core is already using a register file, with enough resources to support 16 registers (only 4 are used right now), you can easily add 12 more without much impact on core size/speed.
Yes, good point. I do have ideas (don't we all) on how to extend the machine. But it all comes later! For me, probably most interesting is making it easy to target for a C compiler (Toshi's commonly voiced complaint about 65816.) But there's no point re-inventing ARM.

Arlet wrote:
The 16 bit ALU is certainly going to slow down max speed, due to the longer ripple carry path. If you care about speed, removing BCD support should help a bit there.
I'll do that - it's the one incompatibility I'm happy to commit on day one.

But ideally this core shouldn't be significantly slower than yours - double the width should only add a gate delay if we can get a fast adder implementation. (Seeing 43MHz vs 46MHz on a xc3s200-4, 368 slices vs 263.)


Last edited by BigEd on Mon May 16, 2011 6:50 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 8:08 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The ripple carry path will add 16 extra gate delays (2 per bit), but these are dedicated gates, and they're quite fast compared to regular LUT delays. If the place & route puts all 16 bits next to each other, there won't be any extra routing delay either.

43 MHz vs 46 isn't all that bad.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 8:54 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
Arlet wrote:
...The 16 bit ALU is certainly going to slow down max speed, due to the longer ripple carry path. If you care about speed, removing BCD support should help a bit there.
I'll do that - it's the one incompatibility I'm happy to commit on day one...


Just won't be able to display a decimal (0...255) equivalent to hex (0...FF)...

I see you guys are tackling this 16-bit beast head first! Awesome!!
When I get some time, hopefully soon, count on me to commit some resources to it as well!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 9:15 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Quote:
Just won't be able to display a decimal (0...255) equivalent to hex (0...FF)...

I'm not sure what you mean here. Since I've been using Forth, I've been doing everything in hex internally and converting to and from other bases only when there's human I/O.

About the C32 assembler, I should add that it comes with the tables to do scads of different processors, and the info to write your own. As long as you don't go beyond 32 bits, I am not aware of any limitations.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 9:15 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
None of the modern CPUs have a BCD mode, and I don't think it's being missed all that much. You can always do BCD calculations in software, or do them in binary and convert the result.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2011 9:36 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Ok, forget my experience with BCD.
I 100% agree. Maybe it will take more cycles without BCD, but it is a specialized mode after all...

I didn't mean to derail the main topic, which is the cross assembler Garth has had his eye on for awhile,
OR Arlet's point:
Arlet wrote:
The ripple carry path will add 16 extra gate delays (2 per bit), but these are dedicated gates, and they're quite fast compared to regular LUT delays. If the place & route puts all 16 bits next to each other, there won't be any extra routing delay either.

43 MHz vs 46 isn't all that bad.

8 to 16 bits with just a few MHz loss?!
NICE....


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 04, 2011 5:44 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
GARTHWILSON wrote:
About the C32 assembler...
Please could you have a quick look at the docs for Michael McMartin's Ophis assembler? It seems to be a fully featured macro assembler, and he ships a windows installer for non python users. As it has a permissive license I'm optimistic that I can use it as a base for 65Org16. On the other hand, C32 seems to be single-platform and closed-source - even if it is configurable enough I don't think it's a direction I want to take. (Anyone else could feel free, of course.) I don't know what plans the other 6502-variant projects have for tools.

(It would be good to know up front if some particular feature or syntax quirk in the assembler was going to be a deal-breaker!)

Quote:
Code:
.macro print
        ldx #0
_loop:  lda _1, x
        beq _done
        jsr chrout
        inx
        bne _loop
_done:
.macend


Macros may be invoked in two ways: one that looks like a directive, and one that looks like an instruction.

The most common way to invoke a macro is to backquote the name of the macro. It is also possible to use the .invoke command. These commands look like this:
Code:
`print msg
.invoke print msg



Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 04, 2011 6:17 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
With a quick look, the only minus I see is that it looks kind of slim on the operators. For example, there's no way to shift a parameter left or right so many bits. Edit: I don't see any nesting of macros allowed either.

I like to make my macro invocations look like instructions instead of directives. The way they are easily told apart from mnemonics is that the mnemonics are all three-character, and the macro names are more. Of course in the case of aliases, they can look like mnemonics, like using DEA in place of DEC A if the assembler doesn't have DEA.

There's a 3-page topic I started at viewtopic.php?t=1475 on desirable assembler features. Since then, I have jury-rigged my way around the lack of indexable assembler variable arrays for a stack in the assembler itself, and made and used the program-structure macros in my PIC16 code for our company, but I have not done it yet in 6502 code since I mostly use Forth on the 6502. [Edit: Done. See http://wilsonminesco.com/StructureMacros/ .]

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 98 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: