Let me take these points in order:
ElEctric_EyE wrote:
A usable assembler/disassembler/monitor will be our biggest "stumbling block"...
This should be next up, perhaps after I do a bit of a tidy up and check in some source code. I found a BSD-licensed
python-based table-driven assembler by Michael McMartin which might be a good basis. Or, there's a
small simple one by David Beazley in python3 (but no macros). Python is easy, powerful, productive and cross-platform, and I want to know it better, so it's a language of choice for me.
That is, it gives us an assembler. Possibly
Mike's py65 is a good basis for an emulator which includes some assembler/disassembler/monitor capability, but not on the real machine.
As for a monitor, first step is a loader. I'll only take one step at a time, maximum!
(But, the point is that this is a minimal change from 6502. So porting an existing monitor shouldn't be too difficult.)
ElEctric_EyE wrote:
An easy way to test ISim in a real world project is like what I am doing in the 6502SoC. Have a video output device, read your character data from a memory, be it internal ROM, external Flash, external preloaded RAM from internal ROM (or vice versa), etc.
My next step on the hardware side, probably, is much much smaller: some serial I/O so I can talk to a host. I need a minimal ROM, and some minimal block RAM - which can be the same thing, because a block RAM can have initial contents. Nothing off-FPGA. No video. Something like the micro-uk101, but probably i2c rather than rs232, because I've got easier ways of dealing with that.
ElEctric_EyE wrote:
Do you mean a 65016.org website will be starting up soon?
Nope. But keep an eye on
my fork of Arlet's core - that's where I'll be working. In usual git fashion, anyone else can make a fork and make changes, and we can adopt each other's changes as we see fit.
ElEctric_EyE wrote:
Also, [16:0] IR's?!
Almost - [15:0] - but I don't intend to do anything with the extra 8 bits until I've got something working usefully. Anyone else can, of course. One might draw on André's opcode map, or on other extensions, or make up one's own.
ElEctric_EyE wrote:
Xilinx parts are ready to do a 2 cycle 16x16 multiply and 32 bit result.
Indeed, and multiply is an interesting case, but it's not interesting until we have a CPU that works, in a system, with some tools and some software. That's a long way off.
ElEctric_EyE wrote:
I for one am very interested.
Great!
GARTHWILSON wrote:
If I can't get my 65Org32, I'm definitely interested in the 65Org16.
Great! The 65Org32 might come later - I can begin to see how it might follow on - but for me at least the 65Org16 is a first step.
GARTHWILSON wrote:
For an assembler, the Cross-32 assembler ...full-featured macro assembler without having to write it from scratch.
Thanks for the pointer - it's a good goal. But as noted above, I've found some other starting points. I think some software people would find an assembler not too hard, but I'm going to have to stand on someone's shoulders. Note that a table approach gets interesting when the opcode is 16 or even 32 bits. But as a starting point I only have the original NMOS opcodes - the interesting bit is handling 16-bit immediates and branch distances, and 32-bit addresses.
Arlet wrote:
... tip to show registers separately in simulation...
Thanks!
Arlet wrote:
Since you now have 16 bit instructions, it would be nice to add some more registers. Since the core is already using a register file, with enough resources to support 16 registers (only 4 are used right now), you can easily add 12 more without much impact on core size/speed.
Yes, good point. I do have ideas (don't we all) on how to extend the machine. But it all comes later! For me, probably most interesting is making it easy to target for a C compiler (Toshi's commonly voiced complaint about 65816.) But there's no point re-inventing ARM.
Arlet wrote:
The 16 bit ALU is certainly going to slow down max speed, due to the longer ripple carry path. If you care about speed, removing BCD support should help a bit there.
I'll do that - it's the one incompatibility I'm happy to commit on day one.
But ideally this core shouldn't be significantly slower than yours - double the width should only add a gate delay if we can get a fast adder implementation. (Seeing 43MHz vs 46MHz on a xc3s200-4, 368 slices vs 263.)