6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 16, 2024 6:19 am

All times are UTC




Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Oct 18, 2018 5:29 am 
Offline
User avatar

Joined: Sun May 20, 2018 7:20 pm
Posts: 20
Hey Everyone!

So I've been working on a project to understand microprocessors a little better, it's been a great educational experience, and I have learned a lot especially from here (and the visual6502 simulation, of course). I am basically designing a CPU based almost entirely on the hardware of the 6502, except with a custom ISA and some more liberties with what can be done. The idea though is for it still to make sense cycle-count wise, and to technically be possible to layout on hardware.

Some Interesting Documents
README: https://github.com/TReed0803/kasm/blob/master/kasm/k65/README.md
Public structures/functions: https://github.com/TReed0803/kasm/blob/master/kasm/k65/k65.h
Public ISA Header: https://github.com/TReed0803/kasm/blob/master/kasm/k65/isa.h
Example PLA Signals: https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/art/adc.inl

Honestly, I was going to wait longer to share this; there are still some issues, and things I haven't exactly decided on. But it's getting harder and harder to work without feedback, so I figured I would post some important links to the project and see if anyone is able to share some knowledge or can help in any way.

Another reason I was kind of unsure I wanted to share this project is it has a core dependency on another library of mine which I know is Linux-only right now, so it's a difficult project to build right now (and I don't have build instructions in the README, booooo)... Building this isn't the most trivial thing, because it requires some file generation (as you can see in CMakeLists.txt). I plan on automating builds for my projects to make it easier to share binaries in the future, but like I said - getting harder and harder to work without feedback.

Right now, I don't have a functioning assembler, that is a work-in-progress as well. I noticed it was silly to work on the assembler while I still wasn't sure what the final ISA looked like, so I shelved that in an unfinished state for the time being. Right now, I'm using the preprocessor to assemble things via. the pasm.h header in my project. Here is an example of using it from unit tests: k65_art_adc_carry_set

Anyways, here are some of my questions:

1. About the Decode ROM

One of the things I thought would be interesting is having multiple decode ROMs and selecting which we use via the first two bits of the instruction. This is basically how all the instructions are configured, there is an "Instruction Block" which says which decode ROM we use, and then the other 6 bits feed into decode ROMs. I attempted to split it further, by allowing the decode ROM for addressing mode be different from the decode ROM for the logic that happens after memory is retrieved via the addressing modes, but it's all just theoretical. Without putting hardware down I have no idea if this is just a needless complication, or if there is any novelty to this configuration.

I kind of want feedback on that idea in general; is it misinformed? Is it useless? Could it accomplish anything?

2. About Flipping Carry Bit for Subtraction

I'm considering flipping the carry bit for SBC instructions (the reason I name my SBC, SBB is because I was planning on attempting to add a control signal to do this). I presume the reason they didn't do it on the 6502 is because it just costs more transistors. There is something kind of nice about the clean layout of not hiding what an adder does, but I've found myself to be confused and remembering that I need to SEC when I SBC before in my own ISA, so I figured it was frustrating enough to fix it. Does anyone think there'd be any confusion caused by that, or does it sound like a solid change?

3. About the ISA

I tried to make ISA changes that I felt made sense, and were logically sound with the ISA bitset that I have constructed. So adding in some instructions missing from the original 6502 (like TXY, TYX just made sense). But some things either had to be cut, or I wanted a parallel to another instruction and I'm unsure if it'd be useful. Here are a few examples:

+ BIT has a mnemonic similar to it called BIX, which is basically an XOR with Accumulator, and updating zero-flag (as well as sign/overflow like BIT does as well). I feel iffy about this one, but figured I'd throw it out and see if anyone could think of any uses.
+ JMP doesn't have absolute-indirect; instead you get absolute, relative, and zero-page.
+ NOT/NEG were easy additions IMO, and they fit well in the RMW block of instructions.
+ ADX/ADY/SBX/SBY also felt like they made good sense to add based on the ISA layout.
+ Added CLX/CLY because it seemed to fit as an implied instruction in place of LDX/LDY. Thoughts?

I tried really hard to get STZ/STF, but for STZ I couldn't think of a way to make it work without wasting a cycle to get the 0 constant on the bus when I needed it. I could make STZ be absolute/immediate(useless)/zero-page based on my layout, fitting them in the CTL block. However, It didn't seem useful to have these instructions if those were the addressing modes. (Also, the struggle to get a zero on the bus for absolute proved to be difficult since I had to push a cycle through the adder to construct zero, since ADL/ADH were busy...) I can elaborate if needed, otherwise I'd have to add a drain to DB, and I don't know if I can justify the extra hardware for a single instruction and bad addressing modes... :(

I also thought really hard about SWP (swap nibble). But I basically concluded that I'd have to add a special input mode (from SB or DB) so that I could pass through the adder to properly update the flags. It feels like a lot of hardware complication for little gain, though it could technically be done. Thoughts on this?

I also worked really hard to see if I can add Direct Addressing Mode, or maybe a better equivalent to what I was doing was Base Page vs. Zero Page. Basically I thought it'd be incredibly useful if you could set a register to be used for ABH during zero-page instructions (making them effectively base-page, where base-page could be 0 or any other page). The issue was, while I could fit an instruction in CTL block for configuring the base page (LDB; absolute, immediate, zero-page). I had trouble thinking it'd be worth it because there was no easy way to transfer to/from, push/pull the value, or in general inspect what the set "base page" was (is there a better name for what I'm describing? I want to say direct addressing because of the 65c816 direct page register). Is this a good call, or could it still be useful even without these instructions? Or would you give up TXY/TYX for something like this? It felt like a lot of messing with the hardware for a feature that would be half-baked in the end. Looking for opinions though.

In general, any ISA feedback would be welcome!

That's All!

Again, I welcome any and all feedback. I hope it is appropriate to ask questions about this kind of project on here. Any you are more than welcome to peruse the PLA for K65, however be careful as there is a lot of copy-paste comments in there. I haven't done an editing pass on the PLA headers, so some of them are in good shape, and others are very obviously not.

Thanks! :D


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 7:14 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10980
Location: England
TrentoReedo wrote:
Honestly, I was going to wait longer to share this; there are still some issues, and things I haven't exactly decided on. But it's getting harder and harder to work without feedback, so I figured I would post some important links to the project and see if anyone is able to share some knowledge or can help in any way.

Thanks for sharing your work-in-progress! Always good to see what's going on, and maybe someone can help with anything which is puzzling or blocking.

About the decode ROM organisation: I think different organisations are all valid, it might be better for understanding and debugging to do it one way rather than another. Or it might be better for maintainability, or for implementation. But for correctness I suspect it doesn't matter.

About the carry bit: it's the same kind of thing as endianness, I think. A person learns how to do things in one way on one machine, and then when they come across a machine which does things differently, they find it odd, or confusing, or repulsive. But which way around it's done doesn't have any absolute sense of correctness. For carry, I'd do it the 6502 way, because that's the system I learnt first - and in a 6502-like machine, on a 6502 forum, it's likely to be most acceptable. (There's an effort underway elsewhere to port Acorn's BBC Basic from 6502 to 6809, and this carry thing has been a source of confusion and bugs and, perhaps, frustration.)

If you don't get on well with the 6502's carry, and it's not because you learnt the other convention on some other micro, I'd say you should persevere and get used to it. It's like driving on one side of the road or the other - best to learn, solidly, the local convention.

On BIX... is that going to come out rather like CMP? The 6502's instruction set is nice and small, so adding instructions should be done with care, in my opinion. You might think an extra instruction will be convenient, but you have to document it and test it, and learn to use it. It's all extra work: the instruction has to earn its keep by being useful.

Indirect JMP is pretty useful!

Operations on X and Y might well be useful to add in.

If STZ is useful, it's because it take fewer bytes, fewer cycles, and fewer registers than doing it by hand. If it happens to take one extra cycle to execute, that's probably not a great penalty.

Nibble swapping, or four-bit shifting, could save quite a few cycles or bytes. The thing to do is to code up some example routines, like multiplication, or whatever, and see how those routines change with the addition of specific instructions. (In a bigger machine, it would also be appropriate to see how a compiler can make use of the instructions.) Which is to say, don't put instructions in because they are ingenious - put them in because they make some specific kind of code more efficient. This is, of course, much more effort! You'll see nearby that one or two people like to add instructions to help Forth: they very much have in mind how a Forth is implemented and which operations take a lot of cycles or a lot of bytes. If your favourite application was robotics, or home automation, or a calculator, or a Basic home computer, or a games machine, you'd make appropriate enhancements and tradeoffs. Making a 'better' microprocessor isn't really a thing - it's better for some kinds of purposes.

Or, of course, just enjoy yourself! The exercise of adding in TXY, and making it work, should be fun!

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 6:21 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
hello, TrentoReedo. Nice to see some creativity at work here! I admit to being a little puzzled, though, as to whether you're designing a simulator or hardware (albeit perhaps imaginary, thought-experiment hardware). I admit I skimmed through the readme in rather a hurry.

As for the ISA, I agree with most of Ed's remarks -- the carry business, for instance. And definitely Indirect JMP is pretty useful!

Nibble swapping is a "feature" I'd approach with caution. It's easy to lose sight of the fact the existing 65xx ISA already has an answer for this problem, namely index addressing. (A lookup table will do a satisfactory nibble-swap in many situations.) The same critical appraisal should be applied to other prospective improvements. Does a solution already exist?

I'm obliged to differ with Ed about endian-ness. Little endian has a solid, practical advantage when a multi-byte addition is performed. For example the 65xx family exploits this during an address mode such as absolute,X. The low byte of the base address is fetched first, thus allowing the low-byte addition to commence on the immediately following cycle -- overlapping with the fetch of the high byte of the base address.

cheers,
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Fri Oct 19, 2018 6:26 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 6:24 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10980
Location: England
(Fair point about endianness of the operand ordering Jeff! For more general data, note that counting down is often preferred to counting up, so a big-endian value can be traversed in little-endian order quite naturally. That said, because of my early inoculation with 6502, I'm solidly a little-endian at heart.)

(Edit to add: I mean, if say adding some 4 byte values, you might have a loop which is both counted and indexed by X, and if the values are in big-endian order in memory, you can then use DEX; BNE loop - if counting up, you need DEX; CPX #3; BNE loop. So, big-endian layout, little-endian traversal, low loop overhead.)


Last edited by BigEd on Fri Oct 19, 2018 7:10 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 6:41 pm 
Online
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Dr Jefyll wrote:
... (A lookup table will do a satisfactory nibble-swap in many situations.) ...

Or maybe just eight bytes and 12 cycles of pure machine instructions:
Code:
      ASL
      ADC   #$80
      ROL
      ASL
      ADC   #$80
      ROL


Credit to David Galloway. I don't know if he invented the technique or just drew attention to it. I saw bogax mention it somewhere too.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 8:34 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigEd wrote:
I mean, if say adding some 4 byte values, you might have a loop which is both counted and indexed by X [...]
Alright, thanks for that. Little-endian means X would be counting up, and you'd need INX / CPX# 4 / BNE at the bottom of the loop. Big-endian means X would be counting down, and you'd only need DEX / BPL at the bottom of the loop.

In a PM to me, Ed made another good point: "It's interesting, I think, that CMP can benefit from big-endian traversal, whereas ADD and SUB need little-endian traversal."

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 8:39 pm 
Online
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
What's with all the secrecy, Ed?

Anyway, the way around that CPX # is to load it with #-3 and count up to zero, right?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 8:42 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10980
Location: England
Good point about counting up to zero!

(No secrecy as such, just a thought I had which happened to fall into a PM compose box - Jeff had prompted me privately to add the extra bit to my earlier post.)


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 19, 2018 9:00 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Yes, I'm feeling a little thick today, and privately mentioned to Ed I was foggy about what he'd posted. He replied with an additional remark which I then shared.

[ Edit: the "inapplicable" remark (following paragraph) is incorrect.]
As for LDX# -3 then counting up to zero, I like the idea but in some circumstances it's inapplicable: for example if you're using absolute,X mode on a 6502 or 'C02. But zero-page,X mode is alright 'cause it'll wrap around. (On '816, different wrap-around rules sometimes apply, so you need to be on your toes.)


-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Sat Oct 20, 2018 3:59 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 20, 2018 1:27 am 
Online
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Dr Jefyll wrote:
... I like the idea but in some circumstances it's inapplicable: for example if you're using absolute,X mode on a 6502 or 'C02 ...

So I should expect this attempt to fail?
Code:
     .org   $f00
offset = 256+mess-mend
      ldx   #offset     ; IOW, -28
loop:
      lda   mess-offset,x
      jsr   cout
      inx
      bne   loop
      brk
mess:
     .dcb   "Twenty-eight letter message."
mend:
     .end

(I know, it looks a bit obtuse, but the majority of the hard work is done at assembly time ...)

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 20, 2018 1:54 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Alright, nice -- it can be done. And only slightly obtusely. (But I already admitted I'm feeling a little thick today! :roll: ) Thanks for posting. And apologies to TrentoReedo for going off into the weeds with this.

ps- can't resist: here's the same thing, but more readable, IMO; YMMV. And, as you say, the work is done at assembly time.
Code:
     .org   $f00
      ldx   #mess-(mend-256)
loop:
      lda   (mend-256),x
      jsr   cout
      inx
      bne   loop
      brk
mess:
     .dcb   "Twenty-eight letter message."
mend:
     .end

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 20, 2018 3:06 am 
Offline
User avatar

Joined: Sun May 20, 2018 7:20 pm
Posts: 20
Thanks for the discussion and encouragement everyone! :)

BigEd wrote:
About the decode ROM organisation: I think different organisations are all valid, it might be better for understanding and debugging to do it one way rather than another. Or it might be better for maintainability, or for implementation. But for correctness I suspect it doesn't matter.

Yeah, in terms of the code that I've written, we assume there is one monolithic decode ROM (here is an example of an older copy of the generated decode ROM and DPC signal header). That is definitely better for simulation because there is one lookup per half-cycle to see the entire state of the DPC signal set.

I'm asking from a hardware perspective if there is any difference in terms of impact on the maximum/minimum clock speed. I always assumed the minimum clock speed of the 6502 was basically a limitation of how long we can wait during PHI1 before charge dissipates and data is lost. And I've assumed the maximum clock speed was due to the clock generation logic not tripping over itself basically. Maybe a higher-level form of what I'm asking is: "What determines the range for the clock speed?" I assumed it also had something to do with the size of the die, so perhaps a tighter-packed layout with several small sections of decode ROM instead of one massive one might be able to overcome some of those restrictions.

BigEd wrote:
About the carry bit: it's the same kind of thing as endianness, I think. A person learns how to do things in one way on one machine, and then when they come across a machine which does things differently, they find it odd, or confusing, or repulsive. But which way around it's done doesn't have any absolute sense of correctness. For carry, I'd do it the 6502 way, because that's the system I learnt first - and in a 6502-like machine, on a 6502 forum, it's likely to be most acceptable. (There's an effort underway elsewhere to port Acorn's BBC Basic from 6502 to 6809, and this carry thing has been a source of confusion and bugs and, perhaps, frustration.)

The way I'm approaching this project (more on that in a bit, because it pertains to what Dr Jefyll asked) is more like how I would design a chip with this technology. I admit it's a bit obtuse, but when I sit down and attempt to put my bias aside, I kind of think the Carry/Not-Borrow bit is a little confusing, and would almost prefer a Carry/Borrow bit, which would involve subtraction logic flipping the carry signal. I'm a little torn though, because one part of me likes the sort of unabstracted pure adder thing going on in the 6502 that doesn't hide what subtraction is really doing (and it parallels incredibly nicely and logically with adding as well). Another part of me thinks it's a little weird to have to SEC to to SBC without a borrow, when I have to CLC to do an ADC without carry.

To be honest, I'm a little surprised how immediately everyone in the thread decided against this. Out of curiosity, if the ISA/CPU wasn't married to 6502, would you still prefer an ISA that had this particular setup about it? I was under the assumption that this was a relatively odd thing that sort-of famously the 65* family of microprocessors did "differently", and many other microprocessors (especially into modern day) flipped the carry in the manner I was describing.

BigEd wrote:
On BIX... is that going to come out rather like CMP? The 6502's instruction set is nice and small, so adding instructions should be done with care, in my opinion. You might think an extra instruction will be convenient, but you have to document it and test it, and learn to use it. It's all extra work: the instruction has to earn its keep by being useful.

Actually, the entire ISA that I am proposing is already implemented in K65 DPC signal layouts (including BIX even though it's experimental)! Basically, if you imagine the BIT instruction, but instead of doing a logical AND it did a logical XOR, that's basically what BIX is.

Here is the state machine data for BIX (absolute, zero-page, immediate):
https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/ctl/bix.inl
Here are the unit tests for BIX:
https://github.com/TReed0803/kasm/blob/master/tests/k65/units/ctl/bix.c

When I started this project, I didn't really set out to begin with the 6502 instructions and then add onto it. More-less, I wanted to make my own ISA, and I attacked the problem by adding instructions that I felt were useful and fit the design specifications of my ISA encoding scheme. Probably unsurprisingly, the ISA I ended up with is very similar to the 6502 (really, it's almost a superset), because what they had come up with is already pretty darn solid for the hardware that they designed. Anyways, I know it provides functionally different logic, but I haven't made any products with this ISA yet, so I was hoping to workshop some use-cases or see if it stuck out to anyone here as particularly interesting or useful.

Perhaps I can make it a "proposed" instruction, release some applications with my ISA, and see if it comes in handy.

BigEd wrote:
Indirect JMP is pretty useful!

Ahh, bummer - it really doesn't fit in my ISA rules. I'd basically have to hack it in. :(

The way my ISA is configured, JMP has three addressing modes (absolute, zero-page, relative). Do you think it would really be missed and I should try to hack it in-place of relative? Or might it be an interesting limitation of a theoretical chip that someone could work around? I'm sure someone could work around it, but I'm worried it might be a little too much of an inconvenience not to have... Thoughts?

BigEd wrote:
If STZ is useful, it's because it take fewer bytes, fewer cycles, and fewer registers than doing it by hand. If it happens to take one extra cycle to execute, that's probably not a great penalty.

I think in this case, I will omit this instruction then. I don't think I can get it the addressing modes it deserves.

BigEd wrote:
Nibble swapping, or four-bit shifting, could save quite a few cycles or bytes. The thing to do is to code up some example routines, like multiplication, or whatever, and see how those routines change with the addition of specific instructions. (In a bigger machine, it would also be appropriate to see how a compiler can make use of the instructions.) Which is to say, don't put instructions in because they are ingenious - put them in because they make some specific kind of code more efficient. This is, of course, much more effort! You'll see nearby that one or two people like to add instructions to help Forth: they very much have in mind how a Forth is implemented and which operations take a lot of cycles or a lot of bytes. If your favourite application was robotics, or home automation, or a calculator, or a Basic home computer, or a games machine, you'd make appropriate enhancements and tradeoffs. Making a 'better' microprocessor isn't really a thing - it's better for some kinds of purposes.

I think I'll omit nibble swap then - it feels to me like I wouldn't get much of what I want for this.

BigEd wrote:
Or, of course, just enjoy yourself! The exercise of adding in TXY, and making it work, should be fun!

Yeah! TXY/TYX already are added! :)
I actually have all of the instructions and pins unit tested - there are a little over 1000 unit tests.

PLA:
https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/ctl/txy.inl
https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/ctl/tyx.inl

Unit Tests:
https://github.com/TReed0803/kasm/blob/master/tests/k65/units/ctl/txy.c
https://github.com/TReed0803/kasm/blob/master/tests/k65/units/ctl/tyx.c


Dr Jefyll wrote:
Nice to see some creativity at work here! I admit to being a little puzzled, though, as to whether you're designing a simulator or hardware (albeit perhaps imaginary, thought-experiment hardware). I admit I skimmed through the readme in rather a hurry.

Hmm, I suppose I should describe more of what I'm trying to do here.

So the end goal is to create a custom theoretical fantasy 8-bit CPU, pop that into an emulator for a custom, theoretical fantasy 8-bit console, and then use an assembler that I made to create games for it. I think it'll be a fun little project. What I got caught up on was designing the ISA - I realized "wait, I don't know the first thing about microprocessors; what cycle counts would make sense, what is a cycle, why are the cycle counts what they are? etc. etc." Anyways, after that I started investigating NES because it just seemed like there'd be a lot of documentation on that (and I grew up playing it of course), and that eventually brought me here!

So the key here isn't exactly to build something married to the 6502, more like start with the 6502 as a base of understanding of microprocessors of the era and make a custom CPU with a custom ISA to use for this made up game system of mine. However, the CPU doesn't exactly have to be made for the game console; it could just be a general purpose CPU design that I happen to use in my fantasy console. That's what I've ended up doing, basically! :)

Honestly, even if it's all for moot, I learned a ton from this and got to do my first exercise in designing a custom ISA. What's super neat is that I'll be able to switch my theoretical CPU from instruction-based emulation to cycle-based emulation (if you look in k65.h, the key mechanism of interacting with the K65 is through two functions, one low step, and one high step). I wanted to make it pretty efficient to simulate in this manner, so I made some other changes as well - for example, the read or write happen on the clock edge from low->high. This means instead of doing two PIN checks on RW (one after low, another after high), I can do one:
Code:
// Loop until we poll any low interrupt vector address.
do {
  k65_steplo(cpu);
  if (cpu.PIN & K65_PIN_RW_BIT) {
    // Handle Read
  }
  else {
    // Handle Write
  }
  k65_stephi(cpu);
} while (!(cpu.PIN & K65_PIN_VPL_BIT));


Dr Jefyll wrote:
I'm obliged to differ with Ed about endian-ness. Little endian has a solid, practical advantage when a multi-byte addition is performed. For example the 65xx family exploits this during an address mode such as absolute,X. The low byte of the base address is fetched first, thus allowing the low-byte addition to commence on the immediately following cycle -- overlapping with the fetch of the high byte of the base address.

Yeah, I think it depends on the situation. For the design of how the 6502 works it's definitely best to be little-endian. I could envision a layout which could work with big-endianness; I would really have to think more on it, but for this kind of design and layout, certainly little-endian is the way to go.

I definitely saw that in cases like absolute, X as you mention:
https://github.com/TReed0803/kasm/blob/master/kasm/k65/detail/uops.h#L178-L215

barrym95838 wrote:
Or maybe just eight bytes and 12 cycles of pure machine instructions

That's a good little code snippet, definitely gonna try to remember that one (or copy it into my notes). Thanks!


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 20, 2018 5:24 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10980
Location: England
TrentoReedo wrote:
Thanks for the discussion and encouragement everyone! :)

Thanks for the good topic and the peek inside your design process!

Quote:
BigEd wrote:
About the carry bit...

... Another part of me thinks it's a little weird to have to SEC to to SBC without a borrow, when I have to CLC to do an ADC without carry.

To be honest, I'm a little surprised how immediately everyone in the thread decided against this. Out of curiosity, if the ISA/CPU wasn't married to 6502, would you still prefer an ISA that had this particular setup about it? I was under the assumption that this was a relatively odd thing that sort-of famously the 65* family of microprocessors did "differently", and many other microprocessors (especially into modern day) flipped the carry in the manner I was describing.


That's a fair question- maybe it's stockholm syndrome... according to Wikipedia both conventions can be found in several places: 6502 with System/360, MSP430, ARM and PowerPC, and then the other camp has 8080, Z80, x86 and 68k. (As IBM did it, that's rather interesting. Otherwise, ARM copied from 6502 and the later machines might have copied from ARM.) It might be a linguistic curiosity that 'borrow' seems like it should be natural to represent with a 1 rather than a zero. More interesting, perhaps, if there are some bit-twiddling tricks which use both ADC and SBC and which work better on a 6502 type system than on a 6800 type system - or vice versa.

Quote:
BigEd wrote:
About the decode ROM organisation: I think different organisations are all valid, it might be better for understanding and debugging to do it one way rather than another. Or it might be better for maintainability, or for implementation. But for correctness I suspect it doesn't matter.

Yeah, in terms of the code that I've written, we assume there is one monolithic decode ROM (here is an example of an older copy of the generated decode ROM and DPC signal header). That is definitely better for simulation because there is one lookup per half-cycle to see the entire state of the DPC signal set.

I'm asking from a hardware perspective if there is any difference in terms of impact on the maximum/minimum clock speed. I always assumed the minimum clock speed of the 6502 was basically a limitation of how long we can wait during PHI1 before charge dissipates and data is lost. And I've assumed the maximum clock speed was due to the clock generation logic not tripping over itself basically. Maybe a higher-level form of what I'm asking is: "What determines the range for the clock speed?" I assumed it also had something to do with the size of the die, so perhaps a tighter-packed layout with several small sections of decode ROM instead of one massive one might be able to overcome some of those restrictions.


That's an interesting set of questions... you may know that the 68k has both microcode and nanocode, which offers a sort of code compression but must in some sense cost time - either cycles or nanoseconds, depending on where the clock boundaries are. In that case, the two kinds of ROMs are cascaded.

The maximum clock speed in something like a 6502 is probably mostly influenced by logic depth - through how many gates must a signal cascade in one cycle. The rollover of PC from 7FFF to 8000, or less realistically from FFFF to 0000, may be the critical path as measured in nanoseconds, and might well also be the greatest logic depth. As you may know, there's a little carry lookahead in the PC circuit which means it doesn't take 15 gate delays to do the 16 bit increment. I'm not certain where the clock boundaries are, but I think it's also true that the machine needs to decide whether or not to increment PC as well as doing the increment, within a cycle. One of the strengths of the 6502 design at low level is the use of transparent latches, which allows some borrowing of time from adjacent cycles. This makes timing analysis rather difficult.

As an alternate design style, the Z80 is clocked at 3 or 4 clocks per memory cycle, and has only a 4 bit ALU. There are operations where that ALU has to work on 4 nibbles in turn to produce a result. (It also has 16 bit incrementer and decrementer, for PC and maybe for SP. And for BC, in loop counting, apparently.)

Beyond logic depth, there are other influences on chip speed: the logic gates themselves, in that NAND is slower than NOR; the fanout of a signal in terms of how much capacitance a gate has to charge or discharge, also a function of wiring length and therefore of distance; whether a gate has a relatively high or relatively low logic threshold, which is to some extent tunable for each gate; how many pass transistors a signal goes through, as each is a lumpy RC component; whether a signal is routed in metal or poly or diffusion. And then, possibly, clock skew is an effect, if the clock arrives at different latches on chip at different times, and possibly - certainly in modern chips - effects of power supply variation across the chip, and effects of temperature variation across the chip. And, in some cases, effects of power supply variation between cycles and during cycles, as various big drivers switch on and off.

So, yes, it might be that four ROMs has different speed from one ROM, but it might need a detailed diagram to figure out what the first-order and second-order effects are of two different organisations. Or, it might be blindingly obvious, but not to me!


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 20, 2018 5:28 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8541
Location: Southern California
Carry bit:  The only other microprocessor I have a lot of experience with is the PIC16 which does the carry the same way as the '02, in that when you subtract a larger number from a smaller one, the C bit gets cleared.  The PIC does a lot of things backwards (and I've put much of that in macros to hide the confusing details so I make fewer coding mistakes), but this in not one of those backwards ones.  Just think of it as the reverse of the addition, where adding puts the carry there and subtracting takes it back, undoing the addition.  Starting with C clear, 9+$A gives you 3 with C set.  Now with C set, do 3-$A, and you get back the 9, with C clear, just like you started with.

It's not clear how familiar you are with the 65816.  Some of your wish list is already there, like TXY, TYX, and BRL (Branch Relative Long, with a 16-bit offset).  BSR (Branch to SubRoutine) can be synthesized with PER, BRL.  Its direct page does not have to start on a page boundary.  Do check it out.  I do use STZ a lot.  CLX, CLY, and CLA (clear X, Y, or A), if they have the 2-cycle minimum, will be no faster than LDX #0, LDY #0, or LDA #0, and they're not used often enough to justify taking the space in the op-code table.  Besides the STF (STore $FF), I've though many times that it would be nice to have an F flag to indicate when the result in the affected register was $FF (or $FFFF in the case of an '816 register in 16-bit mode).

We all tend to get kind of tunnel-visioned in our programming type, and we tend to think that one addressing mode or another, or one instruction or another, is pretty useless.  One that has brought up this way on the forums is the (ZP,X) mode, and then someone else says, "I use it all the time!"

A related topic that comes to mind that would be worth reviewing is: Instructions that I missed

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Oct 20, 2018 3:23 pm 
Offline
User avatar

Joined: Sun May 20, 2018 7:20 pm
Posts: 20
About Decode ROM:
Thanks BigEd! That is very interesting, provides a lot of context I was missing in understanding the clock speed range of the 6502. I think in the case of my fantasy CPU, I'll play clock speed by ear and allow this to be one of the places I take more liberties with the design. It sounds like I'd have to literally lay out the hardware to actually get a measure for this. Out of curiosity, do chip designers attempt to calculate the max logic depth and max delay to a stable signal running through N logic gates when they come up with their initial range of supported speeds? Or is it more experimental; we try to make educated decisions around these variables (logic depth, gates, passes, etc) and just push a real clock signal into it and see when things start deteriorating?

About Carry Bit:
Yeah, I was surprised when implemening subtraction - I heard from a colleague they believe the SEC to subtract without borrow is confusing and backwards. Initially I was willing to agree, but as I implemented the instructions I saw why it was that way. Regardless of my understanding of why it is the way it is, I still ended up tripping over it while writing code; specifically I missed it in all my unit tests for subtraction at first. Fixing unit tests was easy, but I wondered how often people trip over this in practice. It sounds like "not often enough to warrant changing it" or even "not really at all once you get used to it". I'll leave it as is then, and rename SBB (subtract with borrow) to SBC (subtract with carry) to better reflect this. It is nice because in the end there is less we need to do in the ALU to handle subtraction since there is no flipping of the carry bit. I'm convinced.

About the 65816:
Yes! I did a lot of research on this project. Especially I looked at the Z80 and 65816 for comparative purposes. I think I may have limited myself in trying to lay out the ISA encoding in too strict of a manner. I was not attempting to add direct addressing specifically, however I was playing with the idea of giving ADH a register (BP, base page; we'll use my nomenclature for clarity since what I was aiming for is a little different from direct addressing). The struggle I had with this, as mentioned above, is in the strictness of my ISA layout - I could add LDB/STB (LoaD Base page, STore Base page) with a few important addressing modes (absolute, immediate, zero-page). But I couldn't find the right places logically to add other seemingly important operations like TBA/TAB/PHB/PLB.

The reason I added CLX/CLY is simply because it made sense in the ISA encoding scheme, nothing more. (based on the encoding scheme, LDX/LDY have an implied addressing mode; since this is useless, we pass 0x00 through the adder and end up clearing X/Y).

The F flag is an interesting proposition; I would be able to add the F flag for sure, but again I am unaware if my ISA encoding scheme will allow me to fit any useful instructions pertaining to this flag (the set of branching operations is already packed full).

About Other Threads:
Ah, yeah - I suppose I should say that I have dug through many threads about this on the forum. There is a mega-post that collects them all together, as it's an often-discussed topic. http://forum.6502.org/viewtopic.php?f=1&t=4216

Some Extra Thoughts:
I think probably my biggest mistake with this initial design was being too strict with the encoding scheme for the ISA. In the end, it's just a big table which maps to some logic happening on the chip. My thought process was that if I made the layout very clear and strict, it'd be easier to grok from a simulation/assembler/addressing-mode-availability level. Basically, if you know the addressing modes for the instruction block your instruction falls in, you know you have all of those addressing modes available to you.

This is where many of the instructions get added, specifically for index registers (IDX instruction block).
We had 3-bits of possible operations, and since the 3-bits of addressing modes don't change for that instruction block, we end up being able to shove some instructions in places which would normally be useless (for example, ADX is the same operation as INX, just that there is an implied addressing mode which assumes the ADX operation should have a constant 1 on the bus). I admit, it's a little hacky, and there are places (especially in the control block) where I feel like I sort of break my own rules? But it felt cleaner than just a big table of instructions at the time.

I'll take another look at my ISA though and see what I can do. I'm assuming I'll get something usable but with many possibly unneeded limitations. Probably I won't massively redesign my ISA for the K65, and just try to refine what I have - maybe saving other changes for a different chip altogether (if I ever make another one - this took like 6 months of continuous effort... Presumably, a lot of the initial learning was behind me, but I am certain there is just way more left to learn regardless).


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: