Thanks for the discussion and encouragement everyone!
BigEd wrote:
About the decode ROM organisation: I think different organisations are all valid, it might be better for understanding and debugging to do it one way rather than another. Or it might be better for maintainability, or for implementation. But for correctness I suspect it doesn't matter.
Yeah, in terms of the code that I've written, we assume there is one monolithic decode ROM (
here is an example of an older copy of the generated decode ROM and DPC signal header). That is definitely better for simulation because there is one lookup per half-cycle to see the entire state of the DPC signal set.
I'm asking from a hardware perspective if there is any difference in terms of impact on the maximum/minimum clock speed. I always assumed the minimum clock speed of the 6502 was basically a limitation of how long we can wait during PHI1 before charge dissipates and data is lost. And I've assumed the maximum clock speed was due to the clock generation logic not tripping over itself basically. Maybe a higher-level form of what I'm asking is: "What determines the range for the clock speed?" I assumed it also had something to do with the size of the die, so perhaps a tighter-packed layout with several small sections of decode ROM instead of one massive one might be able to overcome some of those restrictions.
BigEd wrote:
About the carry bit: it's the same kind of thing as endianness, I think. A person learns how to do things in one way on one machine, and then when they come across a machine which does things differently, they find it odd, or confusing, or repulsive. But which way around it's done doesn't have any absolute sense of correctness. For carry, I'd do it the 6502 way, because that's the system I learnt first - and in a 6502-like machine, on a 6502 forum, it's likely to be most acceptable. (There's an effort underway elsewhere to port Acorn's BBC Basic from 6502 to 6809, and this carry thing has been a source of confusion and bugs and, perhaps, frustration.)
The way I'm approaching this project (more on that in a bit, because it pertains to what Dr Jefyll asked) is more like how I would design a chip with this technology. I admit it's a bit obtuse, but when I sit down and attempt to put my bias aside, I kind of think the Carry/Not-Borrow bit is a little confusing, and would almost prefer a Carry/Borrow bit, which would involve subtraction logic flipping the carry signal. I'm a little torn though, because one part of me likes the sort of unabstracted pure adder thing going on in the 6502 that doesn't hide what subtraction is really doing (and it parallels incredibly nicely and logically with adding as well). Another part of me thinks it's a little weird to have to SEC to to SBC without a borrow, when I have to CLC to do an ADC without carry.
To be honest, I'm a little surprised how immediately everyone in the thread decided against this. Out of curiosity, if the ISA/CPU wasn't married to 6502, would you still prefer an ISA that had this particular setup about it? I was under the assumption that this was a relatively odd thing that sort-of famously the 65* family of microprocessors did "differently", and many other microprocessors (especially into modern day) flipped the carry in the manner I was describing.
BigEd wrote:
On BIX... is that going to come out rather like CMP? The 6502's instruction set is nice and small, so adding instructions should be done with care, in my opinion. You might think an extra instruction will be convenient, but you have to document it and test it, and learn to use it. It's all extra work: the instruction has to earn its keep by being useful.
Actually, the entire ISA that I am proposing is already implemented in K65 DPC signal layouts (including BIX even though it's experimental)! Basically, if you imagine the BIT instruction, but instead of doing a logical AND it did a logical XOR, that's basically what BIX is.
Here is the state machine data for BIX (absolute, zero-page, immediate):
https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/ctl/bix.inlHere are the unit tests for BIX:
https://github.com/TReed0803/kasm/blob/master/tests/k65/units/ctl/bix.cWhen I started this project, I didn't really set out to begin with the 6502 instructions and then add onto it. More-less, I wanted to make my own ISA, and I attacked the problem by adding instructions that I felt were useful and fit the design specifications of my ISA encoding scheme. Probably unsurprisingly, the ISA I ended up with is very similar to the 6502 (really, it's
almost a superset), because what they had come up with is already pretty darn solid for the hardware that they designed. Anyways, I know it provides functionally different logic, but I haven't made any products with this ISA yet, so I was hoping to workshop some use-cases or see if it stuck out to anyone here as particularly interesting or useful.
Perhaps I can make it a "proposed" instruction, release some applications with my ISA, and see if it comes in handy.
BigEd wrote:
Indirect JMP is pretty useful!
Ahh, bummer - it really doesn't fit in my ISA rules. I'd basically have to hack it in.
The way my ISA is configured, JMP has three addressing modes (absolute, zero-page, relative). Do you think it would really be missed and I should try to hack it in-place of relative? Or might it be an interesting limitation of a theoretical chip that someone could work around? I'm sure someone could work around it, but I'm worried it might be a little too much of an inconvenience not to have... Thoughts?
BigEd wrote:
If STZ is useful, it's because it take fewer bytes, fewer cycles, and fewer registers than doing it by hand. If it happens to take one extra cycle to execute, that's probably not a great penalty.
I think in this case, I will omit this instruction then. I don't think I can get it the addressing modes it deserves.
BigEd wrote:
Nibble swapping, or four-bit shifting, could save quite a few cycles or bytes. The thing to do is to code up some example routines, like multiplication, or whatever, and see how those routines change with the addition of specific instructions. (In a bigger machine, it would also be appropriate to see how a compiler can make use of the instructions.) Which is to say, don't put instructions in because they are ingenious - put them in because they make some specific kind of code more efficient. This is, of course, much more effort! You'll see nearby that one or two people like to add instructions to help Forth: they very much have in mind how a Forth is implemented and which operations take a lot of cycles or a lot of bytes. If your favourite application was robotics, or home automation, or a calculator, or a Basic home computer, or a games machine, you'd make appropriate enhancements and tradeoffs. Making a 'better' microprocessor isn't really a thing - it's better for some kinds of purposes.
I think I'll omit nibble swap then - it feels to me like I wouldn't get much of what I want for this.
BigEd wrote:
Or, of course, just enjoy yourself! The exercise of adding in TXY, and making it work, should be fun!
Yeah! TXY/TYX already are added!
I actually have all of the instructions and pins unit tested - there are a little over 1000 unit tests.
PLA:
https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/ctl/txy.inlhttps://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/ctl/tyx.inlUnit Tests:
https://github.com/TReed0803/kasm/blob/master/tests/k65/units/ctl/txy.chttps://github.com/TReed0803/kasm/blob/master/tests/k65/units/ctl/tyx.cDr Jefyll wrote:
Nice to see some creativity at work here! I admit to being a little puzzled, though, as to whether you're designing a simulator or hardware (albeit perhaps imaginary, thought-experiment hardware). I admit I skimmed through the readme in rather a hurry.
Hmm, I suppose I should describe more of what I'm trying to do here.
So the end goal is to create a custom theoretical fantasy 8-bit CPU, pop that into an emulator for a custom, theoretical fantasy 8-bit console, and then use an assembler that I made to create games for it. I think it'll be a fun little project. What I got caught up on was designing the ISA - I realized "wait, I don't know the first thing about microprocessors; what cycle counts would make sense, what is a cycle, why are the cycle counts what they are? etc. etc." Anyways, after that I started investigating NES because it just seemed like there'd be a lot of documentation on that (and I grew up playing it of course), and that eventually brought me here!
So the key here isn't exactly to build something married to the 6502, more like start with the 6502 as a base of understanding of microprocessors of the era and make a custom CPU with a custom ISA to use for this made up game system of mine. However, the CPU doesn't exactly have to be made for the game console; it could just be a general purpose CPU design that I happen to use in my fantasy console. That's what I've ended up doing, basically!
Honestly, even if it's all for moot, I learned a ton from this and got to do my first exercise in designing a custom ISA. What's super neat is that I'll be able to switch my theoretical CPU from instruction-based emulation to cycle-based emulation (if you look in
k65.h, the key mechanism of interacting with the K65 is through two functions, one low step, and one high step). I wanted to make it pretty efficient to simulate in this manner, so I made some other changes as well - for example, the read or write happen on the clock edge from low->high. This means instead of doing two PIN checks on RW (one after low, another after high), I can do one:
Code:
// Loop until we poll any low interrupt vector address.
do {
k65_steplo(cpu);
if (cpu.PIN & K65_PIN_RW_BIT) {
// Handle Read
}
else {
// Handle Write
}
k65_stephi(cpu);
} while (!(cpu.PIN & K65_PIN_VPL_BIT));
Dr Jefyll wrote:
I'm obliged to differ with Ed about endian-ness. Little endian has a solid, practical advantage when a multi-byte addition is performed. For example the 65xx family exploits this during an address mode such as absolute,X. The low byte of the base address is fetched first, thus allowing the low-byte addition to commence on the immediately following cycle -- overlapping with the fetch of the high byte of the base address.
Yeah, I think it depends on the situation. For the design of how the 6502 works it's definitely best to be little-endian. I could envision a layout which could work with big-endianness; I would really have to think more on it, but for this kind of design and layout, certainly little-endian is the way to go.
I definitely saw that in cases like absolute, X as you mention:
https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/uops.h#L178-L215barrym95838 wrote:
Or maybe just eight bytes and 12 cycles of pure machine instructions
That's a good little code snippet, definitely gonna try to remember that one (or copy it into my notes). Thanks!