6502.org • View topic - Extended 65CE02 Core

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

Extended 65CE02 Core

Page 3 of 3

[ 34 posts ]

Go to page Previous 1, 2, 3

Print view

Previous topic | Next topic

Author

Message

Proxy

Post subject: Re: Extended 65CE02 Core

Posted: Sat Nov 13, 2021 11:26 pm

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany

Sorry i haven't posted in a while on this thread.
I've been working on other projects, like trying to get CC65 to behave or trying to get my ATF1508 based VGA Card to work.
so far both projects have been failures, if i'm stuck on either for a while longer i might just open a thread about them.

even though i haven't really been working on this core (which i now renamed to "65CE816" or just "E816" for short) i have been making some plans on where i want to take this thing.

to start, the X, Y, and Z Index Registers will be extended to 16 bits to make working with data structures larger than 256 bytes less painful.
though to retain compatibility the high byte of each Index Register will only be effected by special instructions. so for example if X contains 0x00FF and the CPU executes the regular "INX" Instruction, the result will be 0x0000 (ie the low byte doesn't overflow into the high byte).
there will be a total of 6 extra instructions for each index register: 16-bit Incrementing/Decrementing, 16-bit Compares (Immediate, Basepage, and Absolute), and swapping the High and Low Byte.

another benefit of having more Internal Registers is that i can redo the MOV Instruction to require fewer operands.
for example the X Register could contain the lower 16-bits of the Source Address, while the Y Register has the Destination Address. the Z Register could hold the amount of bytes left to be MOVed, and the upper 8 bits of the Source/Destination Addresses can be stored on the Stack.

next up, i think i'll be removing the MUL/DIV State Machines.
specifically i've been thinking about Dr Jefyll's advice of implementing these multi-cycle ALU Instructions as "partial" Instructions.
so instead of a MUL/DIV Instruction doing all 8 steps of the algorithm internally, it would only do a single step in the algorithm per instruction... so you have to execute the same instruction 8 times in a row to get your final result.
The reason why i have been thinking about doing it like that is because now i have 3 extra 8-bit Registers to play with (the upper 8 bits of X, Y, and Z), allowing me to store the intermediate results of each algorithm step in user accessible registers, which in turn allows Interrupts to occour between MUL/DIV Instructions without breaking aynthing.
Also this would remove all Addresing modes, since i want the algorithms to work entirely on internal registers to avoid having to constantly access Memory and waste cycles re-reading the same data 8 times in a row.
ultimately this would leave me with 2 Opcodes from the original 24. MUL, and DIV. both of them would output the full 16 bit result instead of having separate instructions for either half like before (MUL/MLH, and MOD/DIV).
maybe i'll also add signed version of MUL and DIV in the future if the need for them is large enough.

overall this should reduce the complexcity of the ALU while also decreasing Interrupt Latency, at the expensive of having the user manually set up all the Registers before doing any Multiplication/Division.

I'm also planning on redoing the Interrupt Logic (again), cleaning it up a bit. having ABORT as it works right now (ie actually cutting off the current Instruction) casues a lot of edge cases and i'm very tempted to just go the 65816 route and have it wait for the current instruction to finish, as that makes the whole logic around it just way simpler, at the expensive of having ABORT be as slow as a normal Interrupt like on the 65816.
which shouldn't be much of an issue as this Processor is intended to run at ~25MHz.

Sheep64 wrote:

I knew you were working on something impressive but I didn't expect even a preliminary version before Dec 2021. Indeed, I expected progress from randyhyde in Aug 2021 and long before you. Your work is simultaneously modest (one prefix) and bold (multiply, divide and modulo).

Thank you, i specifically made this thread after i had a working prototype to avoid having plans upon plans without anything solid ever coming out.

Sheep64 wrote:

With FPGA, you have the luxury of inventing your own signals. Separate read and write signals would be very welcome. This would skip the typical three NAND gate qualified write logic of many 6502/65816 boards. It would also aid RC2014 or RC6502 compatibility.

hmm, i guess i could add an inverted version of the RW output, that way you could connect one of them to the WE pin on your Memory, and the other to the OE Pin. and it would still be compatible with a regular 65Cxx System.

Sheep64 wrote:

Indeed, if you have opcodes spare, it may be worthwhile to consider one page I/O instructions. These would work like zp,X but for an I/O segment. With or without a Z80 style /IOREQ, you may be able to simplify your signals.

I got some mixed feelings about IO Instructions. On one hand they are basically just more limited Memory Instructions with a pin on the CPU telling the system that it's IO, but on the other hand it makes it less likely for programs to accidentically write to IO when they bug out/crash.
hmmm... something to think about i guess.

Sheep64 wrote:

Primarily, this would be "Program Segment", "Data Segment" and "Other Segment" where A8 and A15 may be used to determine Direct Page, Stack, I/O Segment or Vectors. In the trivial case, these would be the first two or last two pages. At the very least, you'll make drogon happy given that the Ruby anniversary computer also uses page $FE for I/O.

i'm not really sure what you mean by this.
while you can relocate the Base (Direct) page and the Stack there is no reason to limit them to some specific page in Bank 0 as (unlike in the 65816) you can place them anywhere within the 24-bit Memory Range.
Honestly i recommend allocating 1 Bank per segment, so that's 64k for the Base Page, 64k for the Stack (with the SP in 16-bit mode), and the Vector table is immovable so that just stays in Bank 0.
plus this is exactly why i added the extra V*A pins, so you can see if the CPU is accessing the Base Page, Stack, Vector Table, etc. without having to keep track of where they are located in Memory.

Sheep64 wrote:

aleferri solved atomic operations on 6502 in May 2021. The solution is quite simple. One or more of the read-modify-write instructions should conditionally write. With your prefix instruction, you could prefix all read-modify-write instructions and make the write phase conditional for all of them. This simplistic arrangement has the benefit that it operates in a fixed number of clock cycles. Therefore, don't try to optimize the last bus cycle of atomic operations.

I'm confused... I haven't noticed anything wrong with RMW Instructions, so what exactly got "solved" and why does the solution involve optionally throwing away the result of the Instruction?
also a lot of the Prefixed Opcodes that correspond with the existing RMW Instructions are already used, for example the Prefixed versions of "Increment" and "Decrement" are: "Increment with Carry" and "Decrement with Carry".

Sheep64 wrote:

S+X stack indexing requires addition in some form. Personally, I'd maintain S+X as a hidden register. If you want to implement this more frugally, don't be concerned about an additional bus cycle for ALU operation because the ALU cycles may overlap with opcode fetches. When prefix is fetched, transfer S to a hidden register. When an instruction is fetched, add or transfer X to a hidden register. In either case, X or S+X is ready for use. Unfortunately, this speculative execution might interact badly with interrupts. FPGA ALU operation may also consume more energy than a mux for S+X.

S+X has been changed to S+Y since CC65 likes to use A and X for 16-bit values. and the Y Register doesn't get much love.
also my Core has a seperate ALU for Address Calculations, the Control unit just tells the Address ALU what Addressing Mode it wants and ta-da it's on the output, no extra cycles or registers required.

Sheep64 wrote:

I see that Dr Jefyll has led you into the quagmire of block copy. You are clever to initiate transfer from stack references but this arrangement has nowhere to dump state when transfer is stopped prematurely. Given the vaguries of memory - especially beyond 64KB - a blitter peripheral may be preferable. Agumander's GameTank blitter copies two dimensional areas and Radical Brad's blitter copies two dimensional areas with alpha channel and run length encoding. I recommend a one element FIFO and interrupt arrangement. This allows solid blitting with no loss of cycles. Power of two sprite scaling would be highly appreciated because, quite honestly, I'm tired of 6502 games below the standard of Out Run. Sprite scaling is too much to ask in discrete implementation but if you want block copy which matches or exceeds 65816 MVP/MVN, two dimensional copy covers more cases.

eh, i just wanted to have any kind of MOV Instruction so it feels closer to the 65816. and while having something Blitter-ish would be sweet for bitmap graphics, it also sounds like it would be a pain to implement and add a LOT of extra logic.
Plus I never intended my MOV to be better than MVP/MVN, I just wanted it to be somewhat equivalent.
and with the current way i want to implement it there should be no Interrupt problems at all as the state of the interrupted MOV is stored in user accessible registers and the Stack.

Sheep64 wrote:

May I be the first to suggest the additional quagmire of floating point? I believe that a very minimal FPU can be implemented with eight instructions or less. All instructions would be abs,X only and the implementation would not conform to IEEE754 or provide conversion functions. Instructions would load, store, add, subtract, compare and multiply. This would apply to a separate accumulator and common flag register. If you choose a suitable representation then DEC abs,X provides floating point right shift and atomic DEC abs,X provides right shift without underflow. Aha! Overall, this arrangement is more suited than x87 to the Raphson approximation of inverse square root.

If you work on an FPU, you should definitely read the FPU chapter of the John Wiley & Sons book Advanced FPGA Design by Steve Kilts. You should also read the 1987 internal INMOS documents regarding the T800 Transputer FPU. The INMOS documents have been published by David Mays. The T800 FPU document explains how to verify an FPU and was written a mere eight years before Intel Pentium Errata #2: FDIV Bug.

Floating Point is a rabbit hole that i try to avoid because of how easily you can get sucked into it.
Doing Fixed Point math in Software should be fast enough for now (especially with the hardware MUL/DIV Instructions), but if i ever get into Floating Point i'll probably do it as a seperate Memory Mapped Peripheral instead of directly building it into the CPU. (that would also mean a regular 65C02/816 can also make use of the FPU as it's just a Peripheral in an FPGA/CPLD that connects to the 65xx Bus)

Sheep64 wrote:

I asked a sheep in the particle physics field if CERN used hyperbolic tangent in relativistic calculations. The answer was "Not knowingly." Therefore, you can dump all of the stupid grandstanding floating point functions. Instead, do sine and cosine concurrently with successive approximations of rotation and implement the remainder with Maclaurin, Taylor and Raphson approximations.

I'm sorry but I only understood like 3 words of that.

Sheep64 wrote:

If you are experimenting with stack introspection, you may want to add a constant to RegS or implement PHS/PLS. Given the association of RegS and RegX, PHS/PLS is most logically implemented by escaping PHX/PLX. Unfortunately, pushing the current stack pointer onto the stack is useless. However, a computed value aids functions with a variable number of parameters.

so basically an Instructions that Pushes the current SP + some Value to the Stack?
that sounds similar to the LINK Instruction on the 68k.
though i don't see the point in having that since if you want to access a variable amount of data or pointers on the stack you can just use either the "Stack Y-Indexed" (SP,Y) or "Stack Indirect Y-Indexed" ((d.SP,Y)) Addressing Mode.
With Y being a 16-bit Register now you can easily access the whole 64k of Stack (if the SP is in 16-bit mode)

Sheep64 wrote:

You may also want to experiment with separate program and data stacks. This may aid implementation of Forth. It is also of general interest. This could be activated with an escaped flag; possibly operating on an escaped RegQ flag register. For downward compatibility, it could instead be implemented as an opt-in, per use prefix. Unfortunately, an alt stack may be incompatible with A8/A15 segment signals. Separately, you may want to investigate a privilege system where escaped CLV drops privileges and has no counterpart to increase privileges. Likewise, PHQ/PLQ would be privileged.

I have thought about a System/User Stack and Supervisor/User Mode like on the 68k. but ultimately decided against it. though i made it much easier to implement such features with external glue logic by having more V*A Pins than the 65816 has.
using RW (Read/Write), VVA (Valid Vector Address), and VSA (Valid Stack Address) the system can easily detect if the CPU is accessing the Stack because of an Interrupt.

Code:

RW  VVA VSA
 X   0   X   -> Non Interrupt Related Access
 1   1   0   -> CPU Reading Interrupt Vector
 0   1   1   -> CPU Writing Interrupt Return Address to Stack
 1   1   1   -> CPU Reading Interrupt Return Address from Stack

using these signals you can easily build a simple memory mapper that replaces the upper 8 bits of the Address bus (A16-A23) with some hardwired value, which basically means all Interrupt Return Addresses will be written to and read from their own 64k Bank instead of where the actual Stack currently is.
the same logic can be used to Enter/Exit some kind of Supervisor mode, for example when an Interrupt Return Address gets written Memory Protection could be disabled, and when the Address gets read Memory Protection will be restored. (you can also use a counter if you expect nested Interrupts)

Sheep64 wrote:

Finally, I'm a keen proponent of abs,Z. You may wish to substitute all abs addressing with abs,Z. If RegZ is reset and never modified then STZ and abs,Z behave as expected. However, applications which populate RegZ have abs,X, abs,Y and abs,Z addressing modes and must zero one register to obtain abs. This arrangement is never worse than 65C02 and it is a significant boon as address-space and register width increases. Specifically, the semantic switches from abs+R to R+offset. Most immediately, for your extension, it allows (abs,X),Y or zp,X and sp,Y to be mixed with abs,Z.

hmm, that idea is pretty interesting... but it would technically break compatibility with the 65CE02. plus what would it even effect? for example you mentioned SP+Y even though it has no Absolute part to it.
If you really mean it's everything "Absolute" related than it would also effect "Absolute Indirect" like in JMP (abs) and JMP (abs,X), and "Long Absolute" and "Long Indirect" Addressing Modes.
I'll think about it, but i'd probably keep ABS pure to make it easier for people to get into the CPU as not having an un-Indexed Absolute Addressing mode would likely weird out some people.
I kinda wish i had a bit to spare in my Secondary Flags Register so i could make a toggle between "abs" and "abs,Z" just to see if i could get used to it.

I should really continue this project sometime in the future but CC65 got me in it's claws for right now.
obviously i'll post whenever i got something done on this, until then have a good a day/week/month/year!

Top

Proxy

Post subject: Re: Extended 65CE02 Core

Posted: Sun May 01, 2022 10:57 am

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany

I have risen from the dead to bring you an update!

Well not really dead since i've still been active on the forum. but it has been a very long time since i last worked on this.
After i ran out of things to procrastinate with i pretty much had no choice but to pick up this Project again (not that that is a bad thing).

but hey, while i did procrastinate this i have:

Figured out how both CC65 and WDC's C Compiler works
Written my own Fixed Point library for CC65 and tested it via Mandelbrot sets and a simple 3D Demo (it runs at 1FPS!)
Finally fixed my ATF1508 VGA Controller (which was used for the 3D Demo)
and I now got my Drivers Licence

Getting WDC's 816 Compiler to work was very useful for getting to know the 816 better.
i looked at how WDC implements certain things like passing parameters to functions, or handling the width flags.

And while i still don't have an 816 SBC (PCBs are already ordered) i still sat down and ported my custom Serial Library for CC65 (100% Assembly) over to the 816 Compiler.
and damn, i thought it was gonna be hard to work with the width flags (m/x) but after i added the Macro's to easily change them (ACCU_8, INDEX_16, etc) it was actually pretty pleasant to program.
there was still a bit of confusion, like learning the 65C02 for the first time i had to look up a list of instructions or addressing modes every now and then.
but overall it was a lot easier than i expected, and i learned a lot about the 816.

which is very useful for this project and inspired a lot of changes i did.
speaking of which, lets talk about the changes! because of how many there were i pretty much redid the whole thing from scratch.

Mode Flags and Registers

in the previous post i talked about how i didn't want Mode bits and just wanted to use Swap Instructions to allow users to access the high byte of each Index Register...
but after getting to know the 65816 a bit more and coming to enjoy the width flags it has i finally decided to just do the same with my CPU.
but unlike the 65816 there are actually 3 Flags:

A - Accumulator Flag, when set the Accumulator is in 16-bit Mode and all related instructions will also change accordingly (LDA, STA, BIT, TSB, TRB, ADC, SBC, etc)
X - Index Flag, when set the X/Y/Z Index Registers are in 16-bit Mode and all related instructions will also change accordingly (LD*, ST*, CP*, IN*, etc)
M - Memory Flag, when set all Read Modify Write (RMW) Instructions will work on 16-bit words instead of 8-bit ones. (this excludes the existing 16-bit versions, INW, DEW, AWL, AWR, LWR, RWL, RWR)

so, it's exactly as if you took the "m" flag from the 65816 and split it into 2 seperate flags.

another thing i learned from the 816 is that it ignores Page and Bank boundaries for direct page and absolute addressing modes. (unless they are at specific locations and in emulation mode)
i also needed those to wrap when you just use the regular 65C02 and 65CE02 features, but i didn't want to lock anything to bank 0, so instead i just added an extra flag that enables or disables page/bank wrapping for those 2 addressing mode groups.

DW - Data Wrap, when set disables Bank wrapping for absolute addressing, and Page wrapping for base page addressing.

the original "W" Flag now called "PW" also still exists and functions the same as before:

PW - Program Wrap, when set any PC relative addressing mode can cross Bank boundaries which includes regular Program execution

the Interrupt Mode flag also still exists, but because i needed the space in the FL1 register i removed the built-in priority Interrupt system, so you again only have 1 IRQ like on any existing 65xx CPU.

IM - Interrupt Mode, when set the BRK and IRQ Vector are seperated. BRK remains at $00FFFE and IRQ gets relocated to $00FFF6

maybe i'll do something else with this in the future but likely not.

Now, a bit about Registers.
Most stayed the same, but there are some differences.
as expected A, X, Y, and Z are now 16-bits wide, with the High Byte of each of them being constantly cleared when in 8-bit Mode (including the Accumulator)
but the B (Base Page) Register is now also 16-bits wide, with the original B Register being the High Byte of the new one. This means the B Register works exactly like the DP Register in the 816, allowing you to place the Base Page at any address instead of being forced to be page aligned, but it still works exactly like the B Register from the 65CE02 as TAB leaves the low byte uneffected.
the BBR or "Base Page Bank" Register still exists so the Base Page can be placed at any specifc 24-bit Address in Memory similar to the Stack, though unlike the Stack the Base Page can cross Page and Bank boundaries when DW is set to 1.

this likely raises questsions as to how Transfer Instructions now work with the wider Accu/Index Registers, so i hope this visual will be a help:

Code:

TXS/TSX/TYS/TSY; TXB/TBX/TAB/TBA:

8-bit Mode:
   High  : Middle :   Low
:   SBR  :   SPH  :   SPL  :
:        :  Y(Lo) :        : TYS/TSY only effects the Low Byte of Y, and the High Byte of SP
:        :        :  X(Lo) : TXS/TSX only effects the Low Byte of X, and the Low Byte of SP

   High  : Middle :   Low
:   BBR  :  B(Hi) :  B(Lo) :
:        :  A(Lo) :        : TAB/TBA only effects the Low Byte of A, and the High Byte of B
:        :        :  X(Lo) : TXB/TBX only effects the Low Byte of X, and the Low Byte of B (New Instructions)

16-bit Mode:
   High  : Middle :   Low
:   SBR  :   SPH  :   SPL  :
:  Y(Hi) :  Y(Lo) :        : TYS/TSY effects Low and High Byte of Y, and the High Byte of SP and SBR
:        :  X(Hi) :  X(Lo) : TXS/TSX effects Low and High Byte of X, and the Low and High Byte of SP

   High  : Middle :   Low
:   BBR  :  B(Hi) :  B(Lo) :
:  A(Hi) :  A(Lo) :        : TAB/TBA effects Low and High Byte of A, and the High Byte of B and BBR
:        :  X(Hi) :  X(Lo) : TXB/TBX effects Low and High Byte of X, and the Low and High Byte of B

New/Changed Instructions

Now let's talk about the actual Instructions and what changed about that.
most of the Instruction set remained the same so it's really only smaller additions and some extra addressing modes for existing instructions.

Let's start with the FL1 Register, originally you were only able to modify it via the TAF and TFA instructions (since the idea was that it would be rarely changed), but since it now contains a lot of flags that change very frequently i added a few instructions to make working with it easier.

PHF and PLF, to Push and Pull the FL1 Register to/from the Stack (Interrupts don't push it automatically so you need to do that yourself)
and SEF and REF, which work identically to the SEP/REP Instructions from the 816, except in this case they work on FL1 instead of the P Register.

next up, the "PER" Instruction has been renamed to "PRW" (Push Relative Word) and also got itself a brother, "PRL" (Push Relative Long) which is functionally identical except it pushes a 24-bit Word onto the Stack instead of a 16-bit one. both still use a 16-bit Immediate value. nore that these 2 are effected by the PW Flag because they are PC Relative Addressing Modes.

"RTN#" also gotten itself a brother, "RTL#", as a recap RTN# functions like an RTS that moves the Stack by whatever the unsigned 8-bit Immediate value is and then pulls the Return Address.
RTL# is a bit different, it First pulls the Return Address and then moves the Stack.

i did this because i think it is a lot more useful for HLLs than RTN#. specifically i looked at how WDC's C Compiler deals with Parameters on the Hardware stack. It pushes all parameters and then calls the function, so the Return Address is at the top of the Stack, and since the calle is responsable for cleaning up the stack it copies the Return address deeper into the Stack, and then adjust the SP so that the Return Address is back at the top, and then it can finally Return.
so RTL# just puts all of that work into a single Instruction.

next, "MOV", "MUL", and "DIV"!

i have gone back and forth with these 3 Instructions, so many different ideas and ways to implement them.
in the last reply i said how i wanted MUL and DIV to be slice instructions that only do a single step of the algorithm so that Interrupt latency can be faster.

Now i have gone full circle and they are back to being State Machines that temporarily pause the CPU. I know it will be a pretty bad hit on Interrupt latency, but honestly any other solution would just make the whole operation take longer or add way too much circuit complexity.
so the current timings are: 11 Cycles in 8-bit Mode, and 19 Cycles in 16-bit Mode.
and yes they do in fact work in 16-bit Mode, which should be a huge speed boost compared to software routines.
Important note: even though the results are being stored into A and X, the Instructions only check if A is in 16-bit Mode to do the 16-bit operation. so if A is in 16-bit Mode, and X is in 8-bit Mode you still have to wait the full 19 Cycles but the highest byte of the result will be discarded (effectly only doing 16x8 Multiplication and Division)

now to MOV, because of the extra Register width i was able to get rid of all operands, so it works excluively with the Internal Registers, making the whole thing much faster.
MOV now only takes 4 cycles per Byte (2 cycles to fetch the Prefix and Opcode, 1 cycle to Read a Byte, and 1 cycle to Write a Byte)
the Addresses are formed as followed:

the Low Byte of A is the High Byte of the Source Address, the High Byte of A is the High Byte of the Destination Address.
the whole 16-bit X Register is used for the Middle and Low Byte of the Source Address, and Y for the Destination Address.
the Amount of Bytes + 1 to move is stored in Z.
example:
A = $FF00
X = $8000
Y = $1000
Z = $01FF
This will move a total of 512 Bytes ($01FF + 1) from $008000 to $FF1000

Ok, lets look at Existing Instructions with new Addressing Modes:

I already had Absolute Long version of "ROL" and "ROR", but for completion i also added "ASL" and "LSR".
"LDZ" is for some reason missing Base Page Addressing modes (bp and bp,x), they probably just ran out of Opcode space, so i added those myself.
"TSB", "TRB", and "BIT" also got Absolute Long versions (i wish the 65816 had those)
and even "LDX", "LDY", "LDZ", "STX", "STY", "STZ", "CPX", "CPY", and "CPZ" now got Absolute Long Addressing Modes

but there have also been some removed Instructions:

"CBH" and "CHB", which allowed you to convert between BCD and Binary. i just didn't see the point in wasting Space for this, a simple software routine is fast enough.
"PHR" and "PLR", which pushed/pulled A, B, X, Y, and Z onto/from the stack. with the 16-bit Registers my Control Unit literally cannot move that much data in a single instruction and it's not worth either.
"PHS" and "PLS", these allowed you to push/pull the Stack bank Register to/from the... Stack... which makes little sense to me so i removed them.
"STP", my CPU has no power down mode so this is the same a branch that points to itself.

[/list]

That should be everything for far!
I'm still in the process of testing all Instructions in Logisim, once that is done i can upload the Logisim file and the new documentation to Github and then start working on the Digital version so i can then throw it on an FPGA.

Here the current list of all Instructions and Addressing Modes, i really hope there won't be any more massive changes so i can finally call this projected "finished enough"

Attachment:

soffice.bin_3MQrG7bFsW.png [ 1.01 MiB | Viewed 2898 times ]

Top

fachat

Post subject: Re: Extended 65CE02 Core

Posted: Sun May 01, 2022 7:08 pm

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1003
Location: near Heidelberg, Germany

Proxy wrote:

I have risen from the dead to bring you an update!

Good to hear ;-)

Quote:

And while i still don't have an 816 SBC (PCBs are already ordered) i still sat down and ported my custom Serial Library for CC65 (100% Assembly) over to the 816 Compiler.
and damn, i thought it was gonna be hard to work with the width flags (m/x) but after i added the Macro's to easily change them (ACCU_8, INDEX_16, etc) it was actually pretty pleasant to program.

One thing I learned was to add some marker like _xs, or _xl, or _al etc to my labels. So I would easily see if I am accidentally calling code for a different mode.

Quote:

Mode Flags and Registers

this likely raises questsions as to how Transfer Instructions now work with the wider Accu/Index Registers, so i hope this visual will be a help:

Code:

TXS/TSX/TYS/TSY; TXB/TBX/TAB/TBA:

8-bit Mode:
   High  : Middle :   Low
:   SBR  :   SPH  :   SPL  :
:        :  Y(Lo) :        : TYS/TSY only effects the Low Byte of Y, and the High Byte of SP
:        :        :  X(Lo) : TXS/TSX only effects the Low Byte of X, and the Low Byte of SP

   High  : Middle :   Low
:   BBR  :  B(Hi) :  B(Lo) :
:        :  A(Lo) :        : TAB/TBA only effects the Low Byte of A, and the High Byte of B
:        :        :  X(Lo) : TXB/TBX only effects the Low Byte of X, and the Low Byte of B (New Instructions)

16-bit Mode:
   High  : Middle :   Low
:   SBR  :   SPH  :   SPL  :
:  Y(Hi) :  Y(Lo) :        : TYS/TSY effects Low and High Byte of Y, and the High Byte of SP and SBR
:        :  X(Hi) :  X(Lo) : TXS/TSX effects Low and High Byte of X, and the Low and High Byte of SP

   High  : Middle :   Low
:   BBR  :  B(Hi) :  B(Lo) :
:  A(Hi) :  A(Lo) :        : TAB/TBA effects Low and High Byte of A, and the High Byte of B and BBR
:        :  X(Hi) :  X(Lo) : TXB/TBX effects Low and High Byte of X, and the Low and High Byte of B

What I don't like on the 816 transfer instructions - if I understand them right - is that they leave the upper byte as it is, if in 8bit mode. There could be some residual data there that may be results in problems.
(I'm pretty sure about that for AC, not so sure right now on X/Y....)

Also, which bits set the CPU flags. If the transfer is 8 bit, how is Z calculated? Make sure it's only the lower 8 bits. Same for the N flag and bit 7 resp. bit 15.

Stuff I was pondering on during my 65k project.

Quote:

Here the current list of all Instructions and Addressing Modes, i really hope there won't be any more massive changes so i can finally call this projected "finished enough"

Attachment:

soffice.bin_3MQrG7bFsW.png

Will have to read through this when more time....

keep up the good work!

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/

Top

Proxy

Post subject: Re: Extended 65CE02 Core

Posted: Sun May 01, 2022 7:28 pm

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany

i should've mentioned that the Transfer Instructions between the Primary Registers (A, X, Y, and Z) are always 16-bit operations, no matter what mode the registers are in.
but they do still check for the Mode of the Destination Register (TAX checks for X flag, TXA checks for A, etc) in order to update the flags correctly.

example, X is in 16-bit Mode, A is in 8-bit Mode (ie High Byte is forced to 0):

A = $00F9
X = $FEED

doing TAX will result in:

X = $00F9
with the Z Flag cleared since it's not 0, and the N flag cleared since bit 15 of the operation is not set.

another example, still X in 16-bit Mode, A in 8-bit Mode:

A = $0089
X = $FF00

doing TXA will result in:

A = $0000

with Z set since the lower 8 bits are 0 (upper 8 bits are ignored), and N cleared since bit 7 of the operation is not set.

and the same concept goes for a lot of ALU operations. for example ADC, SBC, INC, DEC, are always 16-bit operations, the only difference between 8 and 16 bit mode is how the flags are determined and if the whole thing is saved or only the Low Byte.

Top

Page 3 of 3

[ 34 posts ]

Go to page Previous 1, 2, 3

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

Who is online

Users browsing this forum: No registered users and 25 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum