6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Oct 06, 2024 6:23 pm

All times are UTC




Post new topic Reply to topic  [ 104 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Tue May 02, 2017 12:43 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
litwr wrote:
IMHO the hypothetical zp-registers claim to be the functional equivalent of z80 B, C, D, E, H, L-registers which may be combined into pairs BC, DE, HL. I think that instructions POP and PUSH for zp-registers would be very useful, z80 has advantage over 6502 using them.
The prefixes are the good supplement to the mode model. Intel x86(-64) is unthinkable without both. However I am not sure that they are good for 6502+ ISA because there is a fast mode change instruction, it may be used as a prefix.
BTW we need 2 bits only to describe mode: for data 8/16/32/64-bits and for addresses 16/24/32/64-bits. So 6 bits in the mode register will be enough to describe all modes. However I prefer to have 2 more bits describing X and Y index registers separately.
And indeed I want to use (zp,X),Y addressing with later 6502+. :)

What you're saying about pushing ZP registers directly is accomplished on the 65816's two-byte PEI instruction. It takes the 16-bit data at the direct-page address pointed to by the operand and pushes it onto the stack, without affecting the processor registers. (Direct page on the '816 is like ZP on the 6502, except this 256-byte segment can be anywhere in the first 64K of memory space. It is not locked to page 0, nor does it have to start on a page boundary.) The instruction is written like PEI(ZP_addr); but it does not read and push the contents of the address pointed to by ZP_addr; instead, it reads and pushes the contents of ZP_addr itself. The Fischer book, on p.216-218, says it's indirect, and the L&E manual also writes it PEI(DP); but the L&E text as well as my own experiments say that it's not really indirect.

The 65816 changes modes in two clocks, and you can set it and leave it for a long time, meaning that you don't consume additional time or memory on every instruction like you would with prefix bytes. In my '816 Forth, I put it in native mode and never touch that again, and I leave the accumulator in 16-bit mode and the index registers in 8-bit mode most of the time, and seldom change them either.

The '816 also has an op code reserved for future expansion of the instruction set, which never happened but leaves the door open to another 256 op codes.

The 816's PER (push effective relative address) has several applications; but a nifty thing sometimes done with it is to push the relative address of a table and then use stack-relative indirect indexed addressing. You can, for example, have the address of a table on the stack at some arbitrary depth, and index into that table to access its contents. An example 65816 instruction might be LDA(3,S),Y, which gets the table address from the 3rd and 4th bytes on the stack, adds Y to the address, and uses the result to know where to load the accumulator from. This is a two-byte, seven-clock (eight if you have the accumulator set to 16-bit) instruction on the '816. Further improving the possibilities is that Y can also be 16-bit. Note that stack-relative indirect indexed addressing has double indexing and double indirection! (And in this example, is following putting the relative address of the table on the stack, meaning the program and its data are relocatable even after they are loaded and running.)

Jack Crenshaw, an embedded-systems engineer who wrote regularly in Embedded Systems Programming magazine said in the 9/98 issue that he still couldn't figure out why, benchmark after benchmark, the 6502 could outperform the Z80 which had more and bigger registers, a seemingly more powerful instruction set, and ran at higher clock rates. I suspect it was because of the 6502's efficiency advantage in indexed and indirect operations. The 65816 adds a lot more advantages than I mentioned above. I highly recommend looking into it. It is being produced today, with no end in sight.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 3:14 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
I have not actively joined in the conversation on this topic, but I have been following the discussion.

I have implemented an enhanced version of the 6502/65C02 which I think is in keeping with the general philosophy of the 6502. However, I have taken liberties to add certain features that I think are important to the support of many High Level Languages (HLLs) such as C and Pascal. In addition, following some friendly prodding by Dr. Jefyll, I've included built-in support for a FORTH VM.

During my time actively working on the core, I've added dedicated instructions for stack-relative addressing, base-pointer addressing, and FORTH IP relative addressing with auto increment. I also included support for kernel and user mode stack pointers, an auxiliary (third) stack using X, and various prefix instructions for increasing the size of ALU operands and operations, adding indirection, and a combination of both indirection and size. In addition, I added prefix codes to override the accumulator with either X or Y.

In the final implementation of the M65C02A, I've eliminated specific base-relative instructions, and decided to use a 1-offset base relative addressing mode using the built-in pre-indexed (X) instructions. In other words, I gave up the need to use a 0-offset base relative addressing mode using dedicated opcodes; I came to appreciate the need to preserve compatibility with the modern WDC W65C02S instruction set.

In the process of porting the Mak Pascal compiler, I realized that I could implement a stack-relative addressing mode using the prefix instruction I had implemented to override the default stack pointer. When applied to the pre-indexed (X) addressing mode instructions, there is a full complement of instructions using both 8-bit and 16-bit offsets greater than the number I was able to support with the free opcodes of the W65C02S microprocessor. The only real limitation for me was that using S as the index/base register, an offset of 1 is needed instead of 0 to access the top of stack element. This is really an aesthetic personal preference; the code generator of a compiler really could care less if the offset for the top of stack or the stack frame is 0 or 1.

I have finished the development of the M65C02A core, although I've not fully tested all of the changes that I made. I frequently review and update the documentation that I've placed in the README file in the project's github directory. I also update the core's user manual when I have some free time. I've not updated the Pascal compiler, also found on github. in a few months because I'm working on completing another one of my processor projects. In particular, I've not updated the compiler to support the 1-offset format for base-relative and stack-relative addressing modes or the need to include the default override prefix instruction to the pre-indexed addressing mode instruction to derive the stack-relative addressing mode.

In the development of the M65C02A I had as an objective backward compatibility with the 6502/65C02. Other than my desire to remove all dead cycles from the instruction set (see the 65CE02), the basic implementation is fairly true to the 6502/65C02. (Some minor behavioral differences in the BRK, RTS, and RTI instructions should have no consequences on new developments, but may affect existing code. I had to make some minor changes to Daryll's monitor program.) I found that moving back and forth between extended/enhanced and normal operation was best handled by prefix instructions rather than a mode register. Unlike the 65816, which was not a consideration in the M65C02A project, the prefix bytes allow easy enhancement to 16-bit operation and easy return to 8-bit operation.

In the case of the Pascal compiler, if the enhanced 16-bit operations were to be the norm, the default size could easily be set to be 16 bits, and the size prefix could be used to change to 8 bits. This would improve the performance of the core for those instances where the Pascal compiler was used more often than not. Another alternative, which I think I've included in the released core, is to tie this capability to the mapping logic provided by an MMU. For certain address ranges the default size is 16 bits, and in other address ranges the default size is 8 bits.

I certainly enjoy the free exchange of ideas for improving the 6502 that characterizes many of the threads on enhancing the 6502 architecture. I've implemented some of the ideas offered up by many of the longtime members of the forum. As you can glean from my discussion above, I too have had a number of false starts. The trade-offs between the various enhancements and the basic architecture/flavor of the 6502 has been instructive personally, and given me a greater appreciation for the 6502, 6800/6801/68HC11, 8080/8085/Z80 microprocessors.

For me the 6502 has offered the best path toward an enhanced 16-bit option. The others have essentially filled opcode spaces, which makes the development of enhancements more difficult. The wide range of addressing modes supported by the 6502 also made the inclusion of base-relative and stack-relative fairly easy and useful as I found out when mapping the M65C02A onto the 8086 register set to port the Mak Pascal compiler.

My experience in developing the M65C02A core has been enhanced by trying out the ideas. I wanted the core to support stack frames and easily support a HLL like Pascal or C. I first started out by simply adding instructions that I thought might be useful for a compiler. But when I actually ported the Pascal compiler and mapped the instructions used, I found out for myself that what many sources claim regarding the utility of instructions with complex addressing modes or functions in compiled code is true.

I ran a histogram on the number of instructions that the compiler made use of for several different applications, and many of the instructions that I painstakingly added to the M65C02A were unused. I did find that base-relative and stack-relative addressing were extensively used. I also found that I could only effectively use just a fraction of the basic instructions and addressing modes of the 6502/65C02; the operations supported by the compiler and the virtual machine targeted by the compiler just do not make use of zero page or the many addressing modes into zero page. Thus, I reversed many of the "enhanced" instructions I had added and decided to simply use the default stack pointer prefix instruction to enable stack-relative addressing using the existing pre-indexed addressing mode.

The microprogram space freed up by this decision allowed me to make indirection work in more natural way. Previous to this final change, the prefix instructions adding indirection applied indirection prior to indexing. This meant that adding indirection to any pre-indexed addressing mode converted the operation to a post-indexed single or double indirect addressing mode. With the microprogram space recovered by eliminating dedicated base-relative and stack-relative instructions, I was able to apply single or double indirection after indexing by X or before indexing by Y. I think that this is more in keeping with the base architecture. The additional microprogram space also allowed me to implement more support for relative addressing mode: I increased the range for the branches from 8 bits to 16 bits and I added pc-relative subroutine instruction.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 5:02 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GaBuZoMeu wrote:
The advantage of having lots of "pointers" placed in zeropage is only that you can/should avoid saving and restoring them

That usually works great for single programs. But if you're running a multitasking OS, with several user applications, you run into problems with sharing the zeropage.

Quote:
OK, then LDA ($80),Y will fetch two bytes (opcode and zpAddr) but then use reg80,reg81 add Y and finally fetch the data. Ideally this would take only 3 cycles instead of 5.

With a wider bus it could be done in 2. And if you add separate Instruction/Data buses, it can be done in 1.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 5:26 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
I still find it a strange situation, where the very things that make these processors more compiler-friendly also make them so unfriendly to assembly language

You can make a processor more compiler friendly, while also making it easier to program in assembly. The old ARM32 was easy to program in assembly (easier than 6502), but was also very compiler friendly.

The modern ARM Cortex is harder to program in assembly, and probably also a bit harder to use in a compiler, but it is much more efficient, because most of the opcodes are only 16 bit instead of 32. And the more work you have to do as compiler writer only has to be done once.

If you look at the x86, it is also clear that the irregular features (different functions for each of the registers), make it hard to write code for (both assembly as writing a compiler), but make it more code size efficient, because less opcode space is used for encoding the register number.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 6:23 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10949
Location: England
Very interesting to hear your story Michael: by implementing a HLL and reviewing how it makes use of your enhanced architecture, you've been able to close the loop. It's great to have a 6502 derivative which is fully worked out to implementation, which doesn't happen all that often, as it's a lot of work if you're making big changes.

Jeff, too: your Kim Klone and other experiments are marvels of practical engineering!


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 6:35 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Arlet wrote:
GaBuZoMeu wrote:
The advantage of having lots of "pointers" placed in zeropage is only that you can/should avoid saving and restoring them

That usually works great for single programs. But if you're running a multitasking OS, with several user applications, you run into problems with sharing the zeropage.
As I see it, the 6502 was the truest 8 bit processor of all around 197x. It has no 16 bit operations (just an incrementer for PC), so they need a solution for a variable (runtime) addressing of more than 256 bytes. Thats why they "invented" (zp,X) and (zp),Y. Neat and as it turns out pretty powerful - as you don't need to load and unload these pointers for most applications. The same applies to the limited stack, it was sufficient for hardware resources and return addresses.
Multitasking? No, or only in a limited context. Not that MT is impossible - but a general approach is simply painful (and thinking about memory swapping using DMA and perhaps hard disks very very expensive those days).

Arlet wrote:
Quote:
OK, then LDA ($80),Y will fetch two bytes (opcode and zpAddr) but then use reg80,reg81 add Y and finally fetch the data. Ideally this would take only 3 cycles instead of 5.

With a wider bus it could be done in 2. And if you add separate Instruction/Data buses, it can be done in 1.
My cycle counting was done only with an 8 bit 6502+ in mind. The 16bit step was yet to come ;)


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 6:56 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
As I see it, the 6502 was the truest 8 bit processor of all around 197x. It has no 16 bit operations (just an incrementer for PC), so they need a solution for a variable (runtime) addressing of more than 256 bytes. Thats why they "invented" (zp,X) and (zp),Y. Neat and as it turns out pretty powerful - as you don't need to load and unload these pointers for most applications.

I agree, the 6502 was good when it first came out. The problem is that some of the choices that were made to keep it cheap and simple didn't turn out to be very good for future requirements. The big role of zeropage is one of those problem spots. All the zeropage related instructions take up a fair bit of opcode space, while not being very useful for larger computer systems.

And while it's true that you don't need to load/unload the pointers in zeropage, it is quite painful to do pointer arithmetic, such as [B+N*X+offset] addressing, that's typically used in higher level languages. Anybody who disagrees is invited to write a 6502 version of my memory allocation challenge. :)


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 8:33 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
Arlet wrote:
And while it's true that you don't need to load/unload the pointers in zeropage, it is quite painful to do pointer arithmetic, such as [B+N*X+offset] addressing, that's typically used in higher level languages.

Doing it with a ZP data stack, like Forth does it but it can also be done in assembly without a real Forth kernel, it's very easy. It may be painful in terms of cycles taken (particularly for the '02 which can't handle 16 bits at a time), nevertheless easy from a programming standpoint, regardless of how many gyrations one wants to go in levels of indirection, indexing, and in-between steps. I discuss this in section 4 of the 6502 stacks treatise.

Quote:
GaBuZoMeu wrote:
The advantage of having lots of "pointers" placed in zeropage is only that you can/should avoid saving and restoring them

That usually works great for single programs. But if you're running a multitasking OS, with several user applications, you run into problems with sharing the zeropage.

Although I've done multitasking, it was never with a multitasking OS. For only a few tasks, I envision splitting up ZP to give each task its own section of it, reserving a little for system use. Three tasks should be easy, perhaps up to six with care; but obviously at some point you'll run out and have to start copying sections out to somewhere else to let other tasks use them. For some applications the overhead may be acceptable. I think this is what Jonathan Halliday is doing in his impressive 8-bit Atari GUI preemptive multitasking OS. (I've talked with him about it but I don't remember those details.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:20 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
Although I've done multitasking, it was never with a multitasking OS. For only a few tasks, I envision splitting up ZP to give each task its own section of it, reserving a little for system use. Three tasks should be easy, perhaps up to six with care; but obviously at some point you'll run out and have to start copying sections out to somewhere else to let other tasks use them.

I was thinking about the case where you load in a few arbitrary user programs from external storage, and run them at the same time. Since the programs are all independently written, you can't statically allocate zeropage storage between them. Copying is possible of course, but very slow. Compare that to using a [base+offset] addressing mode (or even [base+index+offset]) which gives you all the benefit of zeropage compactness, while not having to bother with static allocation. The '816 helps a little with the DPR, but there's only one, and it's lacking direct manipulation of the DPR itself.
Quote:
Doing it with a ZP data stack, like Forth does it but it can also be done in assembly without a real Forth kernel, it's very easy

It's easier, yes, but it does waste an awful lot of cycles and it's not even very compact either.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:25 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10949
Location: England
Is copying zero page slow? A task switch for interactive users could take 10 milliseconds, feels like it's not too tight a constraint to me. You might well copy some stack too. And you don't need to copy all of ZP either - just the part which is agreed to be task-private. Could be half of it.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:39 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
Arlet wrote:
True, but at least they had adda, suba, cmpa, lea, and movem. Also, the addressing modes allowed both index register and immediate offset, as well as auto inc/dec.
Zp-registers could be also used with cmp, cpx, ... 680x0 address register missed ROR, ASL, DEC, ... and the operations with them do not set flags. So the odds are close to equal. A two-byte instruction similar to MOVEM or z80 LDIR would be with 6502+ too maybe to 1980.

Arlet wrote:
I think the x86 is much easier to keep track of. For 8 vs 16, it had the 'w' bit encoded in the instruction, making it easy to switch back and forth. And the '386 has a default mode (which you should set once), plus a prefix per instruction.

Do you try to work manually with O16/O32 and A16/A32 prefixes? It is almost impossible.

GaBuZoMeu wrote:
Nice, but this usage would cost +1 cycle :) - using abs+x (available) yield the same speed at the expence of one more program byte. Adding an offs,S address mode would be more versatile IMHO.

It would cost +1 cycle only if the page boundary was crossed like with the branches. Later 6502+ with 16/32/64-bit ALU would not have such a delay at all. So offs,S may be considered as a codespace waste.

GaBuZoMeu wrote:
Hmm, this is something I don't understand. Perhaps you can explain this dramatic speedup. But I assume adding 9 accumulators instaed of only one wouldn't yield +900% ??

Yes, +900% for some tasks. But it is impossible it will require too many opcodes. I have some tests with 6809 (2 accumulators) - they show that 2 accumulators may give even more than 100% speed boost.

GaBuZoMeu wrote:
Hee hee - if register <> memory you could fill a program into $0080++ and run it, using regs$80++ having a different contents. Nicely odd :D

Zp-registers and zp-memory are completely independent. I can't find any problem with it.

GaBuZoMeu wrote:
Well, IMHO that's beyond the scope of 65xx.

The main power of a man is his imagination. :)

GaBuZoMeu wrote:
The advantage of having lots of "pointers" placed in zeropage is only that you can/should avoid saving and restoring them - that is exactly what you need to do when dealing with a Z80. There you easily ran out of registers especially as they are not freely interchangeable.
6 mode bits! Meaning 2^6 = 64 possible situations regarding register lengths. Do you played in your mind with situations like context switches or simply interrupts? I just playing with the 816 and realize that I have to save A and X and a second time P (flags) and perhaps DPR. Then I could switch the register size to what I need within the IRQ service. There I have to take care of DPR or using long addressing. Finally I have do undo everything => PLD, PLP, PLX, PLA, RTI :shock:

No DPR, IMHO it was very bad idea. IRQ handlers may be adjustable to a current data width mode so we have to use different handlers only for different address size modes. Interrupt handlers mustn't use all zp-registers, so we have save only several of them. If we had dedicated PUSH/POP (they are mentioned above) then it would occupy maybe even less code than 65816 code.

GARTHWILSON wrote:
I maintain however that part of the reason for this is that modern CPUs are so difficult to program in assembly language. I still find it a strange situation, where the very things that make these processors more compiler-friendly also make them so unfriendly to assembly language.

The point is different. They are not all unfriendly, for example, x86(-64) or ARM look quite friendly for me. The point is that it is almost impossible to make code in assembler so efficient as a code produced by an optimized compiler for C/C++/Ocaml/... So assembler requires much more work and it gives slower codes than a high-level programming language with a good optimizing compiler.
I had some tests. A code in assembler is generally 2-3 slower than the same code written in C for good optimized compilers like gcc for x86.
BTW I don't like 65816 PER instruction. PEI and PEA are just extenders to PUSH.

Dr Jefyll wrote:
Prefix bytes are a potent solution because each consumes just a single place in the opcode table but it makes possible almost 256 new encodings. Unfortunately, the prefix occupies an extra byte of memory. Also, fetching that byte entails a certain delay.

There is another disadvantage of prefixes. The mentioned 6502+ architecture suggests 64 or even 256 modes - this will require 256 prefixes.
Thanks for the link to the interesting KK's projects but it uses fragmented memory while 6502+ allows flat 16MB/4GM/2^64B model.

_________________
my blog about processors


Last edited by litwr on Tue May 02, 2017 3:39 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:53 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
Arlet wrote:
GARTHWILSON wrote:
Although I've done multitasking, it was never with a multitasking OS. For only a few tasks, I envision splitting up ZP to give each task its own section of it, reserving a little for system use. Three tasks should be easy, perhaps up to six with care; but obviously at some point you'll run out and have to start copying sections out to somewhere else to let other tasks use them.

I was thinking about the case where you load in a few arbitrary user programs from external storage, and run them at the same time.

That's what I'm thinking of too.

Quote:
Since the programs are all independently written, you can't statically allocate zeropage storage between them.

You don't need to. You just change the stack pointer X when you switch tasks. The system stores the last value for the application, and restores it when it lets it run again. The applications don't have to care where their stack range is.

Quote:
Quote:
Doing it with a ZP data stack, like Forth does it but it can also be done in assembly without a real Forth kernel, it's very easy

It's easier, yes, but it does waste an awful lot of cycles and it's not even very compact either.

Each separate operation is one or more instructions. I'm sure any RISC would require quite a few instructions to do the example you gave too.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 9:55 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
BigEd wrote:
Is copying zero page slow? A task switch for interactive users could take 10 milliseconds, feels like it's not too tight a constraint to me.

That seems like an eternity for my embedded-control applications. For other applications that are mainly for human I/O, it may be no problem at all.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 10:01 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10949
Location: England
Yes, for realtime you might well need to be quicker. But then you can build your tasks to suit a more static arrangement. Or, set aside say 64 bytes for OS, 64 bytes for realtime and 128 bytes swappable for user tasks.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 02, 2017 10:07 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
Quote:
I have some tests with 6809 (2 accumulators) - they show that 2 accumulators may give even more than 100% speed boost.

Someone pointed out in another topic that the 6809 was not significantly faster than the '02 at a given clock rate. (Additionally, the 6809 was limited to very slow clock rates.)

Quote:
The main power of a man is his imagination. :)

I like that! I might add resourcefulness too.

Quote:
The point is that it is almost impossible to make code in assembler so efficient as a code produced by optimized compiler for C/C++/Ocaml/... So assembler [...] gives slower codes than high-level programming language with good optimizing compiler.

I'd say that speaks very poorly of the programmer! Or actually, I think it comes back to how hard it is to write in assembly language for most modern processors.

Quote:
BTW I don't like 65816 PER instruction. PEI and PEA are just extenders to PUSH.

PER has some nice uses. :wink:

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 104 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 34 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron