6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Sep 23, 2024 12:13 pm

All times are UTC




Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 9  Next
Author Message
PostPosted: Tue Aug 16, 2016 8:37 pm 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
I have another small correction. DEC PDP-11 architecture supported BCD but in a very odd form. It provided a special optional co-processor unit (very costly) for this! They were so called commercial instructions. :roll: It was the strange world. IMHO without 6502 and ARM we were forced to live in it.

Tor wrote:
For realtime it would be. And there was one particular device that was very realtime back then, and that was the floppy drive. Realtime in this case means that it has to be fast enough, but if it is, then it doesn't matter if it is even faster. So, the question is really: How long does it take to process 144 cycles? Acorn dismissed both the National Semiconductor and the Motorola offerings for that reason. Steve Furber explicitly mentions the NS one in this interview: http://cacm.acm.org/magazines/2011/5/10 ... r/fulltext

:D Thanks for this nice anecdote.
Sorry I was inaccurate 144 is the time for the best case. DIVS (signed division) with an operand in memory may take up to 186 cycles. Maybe it had prevented to use the high density floppies with 68000 based Mac, Amiga, Atari.

Tor wrote:
Sophie Wilson has elsewhere mentioned that the 68k had slower interrupt handling than the 6502. Back in the eighties I wrote software for a very realtime bird-monitoring system, lots of interrupts from many sources. The Apple II, or rather, the 6502, had no problems with that.

Agreed, 6502 has fast interrupts. This allows to create the nice demos using different video modes for the each raster line. Z80 based computers missed this. Only z80 at 7-8 MHz had become fast enough for these tricks.
The bird-monitoring system at 6502 looks impossible for me... :shock: How to run it without HDD? Or did it have one or two?

BigDumbDinosaur wrote:
If the NMOS 6502 (and others, such as the 6510) encounters JMP ($xxFF), it will incorrectly load PC with the contents of $xxFF and $xx00, failing to increment the most significant byte (MSB) of the operand to get the MSB of the target address. This error was corrected in the 65C02 and 65C816.

IMHO it is not so easy problem. 6502 has paged architecture. The logic above is too abstract, not practical. I gave the practical example which shows that NMOS 6502 works right and CMOS 6502 wrong. Is there any reversed example? In any way I prefer to use word "quirk" instead of "bug" or "error".
If we have to force theoretical logic then we have to say that addressing modes (zp,X) and zp,X are bugs too. 6502 without these "bugs" would have powerful stack manipulation instructions making 65816 stack addressing modes almost redundant.
IMHO "the fix" of JMP for 65C02 looks like the change of one detail only in the big picture full of the similar details.

GARTHWILSON wrote:
in BASIC benchmark after benchmark, the 6502 could outperform the Z80 which had more and bigger registers, a seemingly a more powerful instruction set, and ran at higher clock rates.

I can try to explain this. I was sure that 6502 is 3 times faster than z80 3 years ago. Then I had an opportunity to meet an 8080/z80 expert programmer. He showed me how to write fast z80 codes. It is much more difficult than for 6502. So I am sure now that a good z80 programmer spending enough time may give z80 code which maybe only 2.1-2.2 times slower than the equivalent code for 6502.

GARTHWILSON wrote:
First for 6502:
Code:
       LDA  (0,X)
       PHA
       INC  0,X
       BNE  fet1
       INC  1,X
fet1:  LDA  (0,X)
       JMP  PUT
; and elsewhere, PUT which is used in so many places is:
PUT:   STA  1,X
       PLA
       STA  0,X


I dare to suggest a better 6502 variant ;)
Code:
       ldy #1
       stx m+1
     m lda (0),y
       tay
       lda (0,x)
       sta 0,x
       sty 1,x

So the code for PUT is only 4 bytes and maybe left without JMP too. Of course, 65816 is much better any way: 12 clocks against 26 and 4 bytes against 13.

GARTHWILSON wrote:
Perhaps SWN was one of the things Chuck and Bill considered when designing the '02 and decided it wouldn't get used enough to be worth the silicon real estate. Although I have wished for it, or for an LSR4, it's clear that they did a good job in deciding what to put there, and Bill in the later 65c02 and '816.

They might add it and something else if they got the vision of CPU without BCD and V-flag. ;-)
The main question of my dreams is around legendary Chuck Peddle. What IF he had the opportunity to continue the work under 6502? Intel made 8086 at 1978 after 4 years since 8080 was appeared. Motorola were making 68000 during 5 years. Only 80186 (appeared at 1982) or 68020 (1984) might compete the speed of 6502... IMHO if MOSTEC might survive the "shark" attacks then we would have the different reality today. Without BBC Micro, Commodore 64, ... Maybe we could even see the Terminators around. :D

BigEd wrote:
Bear in mind that different Basics can have quite different performance. Mallard Basic on the Z80 seems especially good.

I am surprised by the slowness of 6809 systems. It looks like that 6502 may outperforms 6809! :shock:

Rob Finch wrote:
I just thought I would mention that the Z80(8088) and 68000 aren't as slow as they might seem if one looks only at the clock frequency. Both of these processors use multiple clock cycles to interface to the bus. For instance a Z80 uses four clock cycles per machine cycle when reading instructions, so it's really only running at a 1MHz rate, the same as the 6502. The 68000 uses four clock cycles minimum per bus cycle (reading instructions) while running at 8MHz so it's really only running at a 2MHz instruction fetch rate. It's not really the instruction set that slows things down, but the implementation (bus interface). There are a couple of implementations of the 68000 done in an FPGA for instance that gain back performance by using a simpler single clock bus cycle. These cores get a 4x speedup effectively. Later versions of the 68x series shortened the bus interface to three cycles and gained 25% in performance.

I agree that z80 and 68000 are not too slow but the used frequency is also used by RAM and ROM. So 4 MHz 6502 may use the same memory as 4 MHz z80 without wait states. Commodore 128 could use 4 MHz - it is proven. BBC Micro second processors use even higher frequencies. It is obvious that 6502 at 4 MHz is much faster than z80 at 4 MHz. However z80 may use the wait states to use slower memory and this allowed to have the cheap systems like Spectrum or Amstrad CPC/PCW which have the speed matching 6502 at ≈1.5 MHz.
IMHO z80 had also hidden support from Intel which didn't try to spread its 8085. Microsoft explicitly supported z80. z80 had the common OS (CP/M). 6502 systems were isolated: no common OS, no common disc formats, etc. It was possible to make the common 6502 OS too. It maybe GEOS or the special version of CP/M. This demanded support from the big market players. IMHO Motorola might provide such support...

_________________
my blog about processors


Last edited by litwr on Sun Aug 21, 2016 7:49 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 9:33 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8512
Location: Southern California
litwr wrote:
I have another small correction. DEC PDP-11 architecture supported BCD but in a very odd form. It provided a special co-processor unit for this! They were so called commercial instructions. :roll: It was the strange world. IMHO without 6502 and ARM we were forced to live in it.

I could almost go for that.  I have not used the 6502's decimal mode in decades.

Quote:
BigDumbDinosaur wrote:
If the NMOS 6502 (and others, such as the 6510) encounters JMP ($xxFF), it will incorrectly load PC with the contents of $xxFF and $xx00, failing to increment the most significant byte (MSB) of the operand to get the MSB of the target address. This error was corrected in the 65C02 and 65C816.

IMHO it is not so easy problem.  6502 has paged architecture.  The logic above is too abstract, not practical.  I gave the practical example which shows that NMOS 6502 works right and CMOS 6502 wrong.  Is there any reversed example?  In any way I prefer to use word "quirk" instead of "bug" or "error".

Suppose you have JMP ($14FF).  The problem with the NMOS is that it will not read $14FF-1500 to get the 16-bit address to jump to.  Instead, it will get the low byte from $14FF and the high byte from $1400, which is 255 bytes below it and might contain most anything.  It could even be the address of an I/O IC's status register, and reading it would clear a possible interrupt condition.  I cannot imagine any use at all for this.  I doubt that even Jeff could.  :lol:   Every time you define a non-ZP address variable, NMOS requires that you have the assembler check to make sure it's not at $14FF, and if it is, skip a byte so the first byte of the address variable goes down at $1500.  The CMOS version correctly reads the low byte from $14FF and the high byte from $1500.

Quote:
The main question of my dreams is around legendary Chuck Peddle.  What IF he had the opportunity to continue the work under 6502?  Intel made 8086 at 1978 after 4 years since 8080 was appeared.  Motorola were making 68000 during 5 years.  Only 80186 (appeared at 1982) or 68020 (1984) might compete the speed of 6502...  IMHO if MOSTEC might survive the "shark" attacks then we would have the different reality today.  Without BBC Micro, Commodore 64, ...  Maybe we could even see the Terminators around. :D

This topic itself is related to the what-ifs.  Bill Mensch estimates that if the 65c02 were made to the latest deep-submicron technologies available today (actually I think he said this in 2015), it would be capable of 10GHz.  So what if.  It won't be done unless someone decides the market looks good enough to get a good return on investment though, and that's subject to whims, perceptions, biases, etc..  Memory and I/O would have to be on the same chip, and it would still have certain limitations like the 16-bit address space and that the instruction set does not lend itself to things like relocatable code; but it's fun to imagine the possibilities anyway.

Quote:
BigEd wrote:
Bear in mind that different Basics can have quite different performance. Mallard Basic on the Z80 seems especially good.

I am surprised by the slowness of 6809 systems. It looks like that 6502 may outperforms 6809! :shock:

The 6809 never got past 2MHz AFAIK, so outperforming it with today's 65c02 should be easy, considering they're all guaranteed to be able to handle a clock speed at least seven times as high, and the highest run is over 100 times as high.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 17, 2016 9:06 am 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
litwr wrote:
BigDumbDinosaur wrote:
If the NMOS 6502 (and others, such as the 6510) encounters JMP ($xxFF), it will incorrectly load PC with the contents of $xxFF and $xx00, failing to increment the most significant byte (MSB) of the operand to get the MSB of the target address. This error was corrected in the 65C02 and 65C816.

IMHO it is not so easy problem. 6502 has paged architecture. The logic above is too abstract, not practical. I gave the practical example which shows that NMOS 6502 works right and CMOS 6502 wrong. Is there any reversed example? In any way I prefer to use word "quirk" instead of "bug" or "error".

No, it is a genuine bug (I see Gary also provided details). If your assembly program just happened to be assembled so that the 16-bit address label address you used ended up just there, then your program would fail. Add one line of code anywhere, and reassemble, and it works. The bug is that the 16-bit address would be split so that the two parts were not contiguous. I don't see any relation to the other addressing modes you mentioned.

Quote:
IMHO if MOSTEC might survive the "shark" attacks then we would have the different reality today. Without BBC Micro, Commodore 64, ... Maybe we could even see the Terminators around. :D
MOS wasn't really a victim of anything back then, it was simply acquired by Commodore because Commodore was bitten by relying on TI for their chips, and TI then turned around and underbid them with their own TI products. "Never again", said Commodore, and bought MOS so that they could have an in-house chip provider.

The 6502 legacy lives on in WDC, with Bill Mensch from MOS. And therefore we have a 65C02 that is still in production, at higher speeds, unlike the otherwise very nice 6809 which is only old stock and runs at original slow speeds.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 17, 2016 12:04 pm 
Offline

Joined: Sat Jul 28, 2012 11:41 am
Posts: 442
Location: Wiesbaden, Germany
Quote:
6502 has paged architecture.
This is only true for zeropage and implied stack addressing modes. Absolute addressing modes should generate a 16-bit address like
Code:
        ldx #$c0
        lda $10c0,x
would load from $1080 in a "paged architecture" but correctly loads from $1180 both on CMOS and NMOS.

Code:
mjmp    jmp (divjmp)
;...
indjmp                  ;code should continue here
;...
        org *|$ff
divjmp  dw indjmp
The example would not be executed correctly on the NMOS CPU.

_________________
6502 sources on GitHub: https://github.com/Klaus2m5


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 9:27 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
Let's see at the next code
Code:
   ldx #1
   lda $ff,x

What is the location of a byte to load into AC register? Is it $100? No, It is 0. The same as JMP ($FF). Is it bug?
Let's see the other code
Code:
   ldx #0
   lda ($ff,x)

What is the address of a location of a byte to load to AC? Is it at $ff and $100? No, it is at $ff and 0. The same as JMP ($FF). Is it bug?
What is the definition of word "error" ("bug")? The word "error" means that the system behavior does not correspond the documentation. Is JMP (xxFF) not documented? No. So it is not bug.
Let's see at NMOS JMP (VALUE) closely. VALUE is not a variable, it is a constant. So it is easy to have a VALUE at the proper address. In the worst case we have to spend one more byte for the word alignment for the whole program to ensure this. I agree that in this case CMOS 6502 is in the microscopical scale better - it allows to save this one byte.
JMP (VALUE) is often useful to make fast jump tables. It was shown in my example above that NMOS 6502 works better in this case. I have to repeat it with more details. For example, we have two 256 bytes jump tables for odd and even arguments. Let's see the code
Code:
   lda INDEX
   and #1
   bne l_odd

   sta m_even+1
m_even
   jmp (even_table)
l_odd
   sta m_odd+1
m_odd
   jmp (odd_table)

align 256
odd_table .word ...    ;256 bytes
even_table .word ...   ;256 bytes

The both tables occupy exactly continuous 512 bytes for NMOS 6502 but they requires ugly 513 bytes for CMOS 6502. :( This example is not just artificially theoretical. I met such situation with my code http://forum.6502.org/viewtopic.php?f=2&t=4185. This ugliness of the code for 65816 in this case even made me a bit ill. :x I have to say that 65816 is very good but it is not perfect. CMOS 6502 additional features over NMOS 6502 are mean very little for the programming. IMHO JMP (xxFF) "problem" only creates excessive incompatibility. I have also to say that some of NMOS 6502 undocumented instructions are useful...

Tor wrote:
MOS wasn't really a victim of anything back then, it was simply acquired by Commodore because Commodore was bitten by relying on TI for their chips, and TI then turned around and underbid them with their own TI products. "Never again", said Commodore, and bought MOS so that they could have an in-house chip provider.

The 6502 legacy lives on in WDC, with Bill Mensch from MOS. And therefore we have a 65C02 that is still in production, at higher speeds, unlike the otherwise very nice 6809 which is only old stock and runs at original slow speeds.

It is the point of Commodore. Did MOSTEC want to be sold?
I had read an article about MOS Tecnology about 20 years ago. Sorry, I can' t find it now. It showed that the situation around MOS financial problems were artificially created. It was namely a kind of the shark business. Commodore got credit money for this case...
Anyway, MOS Technology was sold and the development of 6502 was stopped. :( Intel and Motorola had made the next generation of their chips to the late 70s. IMHO MOSTEC might do the same. 65816 was a bit too late and can't match 80186 or 68000.

GARTHWILSON wrote:
This topic itself is related to the what-ifs. Bill Mensch estimates that if the 65c02 were made to the latest deep-submicron technologies available today (actually I think he said this in 2015), it would be capable of 10GHz. So what if. It won't be done unless someone decides the market looks good enough to get a good return on investment though, and that's subject to whims, perceptions, biases, etc.. Memory and I/O would have to be on the same chip, and it would still have certain limitations like the 16-bit address space and that the instruction set does not lend itself to things like relocatable code; but it's fun to imagine the possibilities anyway.

IMHO x86-64 at 1 Ghz can easily outperforms 6502 at 10 Ghz. X86_64 can make 64 bit multiply with 128 bit result in 1 tick! An x86_64 instruction may require dozens of 6502 instructions. The same is true for modern ARM too.

GARTHWILSON wrote:
The 6809 never got past 2MHz AFAIK, so outperforming it with today's 65c02 should be easy, considering they're all guaranteed to be able to handle a clock speed at least seven times as high, and the highest run is over 100 times as high.

The mentioned table shows that 6809 at 1.79 Mhz is 5 times slower than 6502 at 2 MHz with BBC Basic 4. The Basic benchmarks are very inaccurate but... So It is possible that 6502 may outperform 6809 with the same frequency. :shock:

_________________
my blog about processors


Last edited by litwr on Thu Aug 18, 2016 11:11 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 9:39 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
About the 6502 memory model: I'd say it's not a paged model. There are two special pages, and for people concerned with cycle counts there are page-crossing boundaries, but otherwise it's flat. And so the JMP bug is a bug.

For the case of the 256-way jump, that's interesting that your approach winds up using 513 bytes - and indeed, that seems ugly. (But it might be the fastest way.) But, you don't need that bottom bit of the index any more, I think...

Code:
LDA index
LSR A
BCC even
odd:
ASL A
TAX
JMP(odd base,X)
even:
ASL A
TAX
JMP(even base,X)

Or, no doubt, something a bit more slick! (Or even, something correct, because that's probably wrong)

On the topic of MOS, yes, they got burnt by the Motorola lawsuit, Jack T stitched them up in a deal to buy components, and they were out of cash so could be bought for very little. Victory to Jack and Commodore.
(Edit: see for example the oral history of Bill Mensch)

There's a big problem comparing CPUs, not only is it certainly going to depend on the choice of benchmark, but you need comparably top quality coding in all cases. In the case of the 6809 Basic benchmark, it's possible the 6809 Basic is not as excellently idiomatic as the 6502 one.


Last edited by BigEd on Mon Aug 22, 2016 7:51 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 10:32 am 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
@litwr: I must admit that I don't understand your example.
The thing with the jmp error is that it affects loading of a 16-bit address, not an 8-bit offset as with so many other instructions. And when you load a 16-bit address you're supposed to load two consecutive bytes, who've ever heard about a 16-bit address split over different memory areas?

When Commodore bought MOS, development didn't stop - quite the opposite, for two reasons:
1) Commodore went nearly insane with creating new variants of every chip. So much so that when some of their key engineers later started working at other places they had to adjust to not being able to simply order up a new variant of a CPU or other chip, as they had become accustomed to while at Commodore.
2) WDC. The next step in 6502 development was, of course, CMOS, and that's exactly what happened with Bill Mensch leaving MOS and starting WDC. Bill M. was essential in the development of the original 6502 (and he also originally came from Motorola, with Chuck Peddle). There was never a break in 6502 development. What's left out is moving to the kind of fab process that the large foundaries can use, e.g. Intel and Samsung (14nm or 10nm or whatever). That's extremely costly. Other than that, there's not much more that can be done with a 65c02.. if you change it it's no longer a 65c02.


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 11:03 am 
Offline

Joined: Sat Jul 09, 2016 6:01 pm
Posts: 180
IMHO CMOS 6502 and NMOS 6502 are good illustration to "bargain one trouble for another". :D
The example for CMOS 6502 is 3 (or 2) bytes shorter (it is good) but 1 tick slower (it is bad). And it also has to use 513 bytes ugliness. ;)
Tor wrote:
@litwr: I must admit that I don't understand your example.
The thing with the jmp error is that it affects loading of a 16-bit address, not an 8-bit offset as with so many other instructions. And when you load a 16-bit address you're supposed to load two consecutive bytes, who've ever heard about a 16-bit address split over different memory areas?

What is about the rule exceptions? They maybe even more important than the main rule. ;) I can repeat, the error is only a case if a system does not work according to the documentation. You may don't like JMP (xxFF) NMOS feature but it is your THEORETICAL position only. It is practically as useful as CMOS 6502 JMP (xxFF) behavior. The example shows this.

Tor wrote:
When Commodore bought MOS, development didn't stop - quite the opposite, for two reasons:
1) Commodore went nearly insane with creating new variants of every chip. So much so that when some of their key engineers later started working at other places they had to adjust to not being able to simply order up a new variant of a CPU or other chip, as they had become accustomed to while at Commodore.
2) WDC. The next step in 6502 development was, of course, CMOS, and that's exactly what happened with Bill Mensch leaving MOS and starting WDC. Bill M. was essential in the development of the original 6502 (and he also originally came from Motorola, with Chuck Peddle). There was never a break in 6502 development. What's left out is moving to the kind of fab process that the large foundaries can use, e.g. Intel and Samsung (14nm or 10nm or whatever). That's extremely costly. Other than that, there's not much more that can be done with a 65c02.. if you change it it's no longer a 65c02.

MOSTEC 6502 team was crashed by Commodore business. It is fact. Chuck Peddles couldn't continue to work under 6502. It is fact. So these and other facts show that the development was almost stopped. Can you compare the ways from 6502 to 65C02 and 8080 to 8086? 65C02 is almost the same as 6502: slightly better and slightly worse. The way from 6502 to 65816 took 8 years... The way to 4510 took 14 years... :(

_________________
my blog about processors


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 11:25 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Quote:
JMP (VALUE) is often useful to make fast jump tables.

Jump tables are shorter using rts instead of jmp indirect instruction (I assume the adresses in the jump table are stored pre-substracted with 1, to compensate for the PC increment the 6502 does after popping the adress of rts instruction). This makes the jmp () instruction almost useless, and the jmp () warp-arround bug (or feature, if you want to call it that way) irrelevent.

Exampe: Jump table implemented with jmp () instruciton:

Code:
   lda JumpTableL,X     ; SIZE=3 LEN=4 (assuming no page cross)
   sta ZPLocation         ; SIZE=5 LEN=7
   lda JumpTableH,X    ; SIZE=8 LEN=11 (assuming no page cross)
   sta ZPLocation+1     ; SIZE=10 LEN=14
   jmp (ZPLocation)      ; SIZE=13 LEN=20

The pointer doesn't have to be in zero page, but I just assumed it is to gain efficiency. jmp () is the only instruction which uses pointers outside of zero-page, but this feature is almost useless.

Using rts instruction instead:
Code:
   lda JumpTableL,X     ; SIZE=3 LEN=4 (assuming no page cross)
   pha        ; SIZE=4 LEN=7
   lda JumpTableH,X    ; SIZE=7 LEN=11 (assuming no page cross)
   pha     ; SIZE=8 LEN=14
   rts      ; SIZE=9 LEN=20

Code takes the same time to execute, but is 4 bytes shorter. It also has the huge advantage of not needing any dedicated locations.

Quote:
MHO JMP (xxFF) "problem" only creates excessive incompatibility.

I agree, this is a fairly minor "bug" and fixing it wasn't really necessary.

Now, as to state my personal opinion on the matter...
Quote:
Is JMP (xxFF) not documented? No. So it is not bug.

I'd say we have to look at the original documentation coming from NMOS. If this is not documented from them, or if it is documented as a bug, then it is a bug. If it is documented as a feature, then it is a feature.


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 3:26 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
Bregalad wrote:
Using rts instruction instead:
Code:
         lda JumpTableL,X      ; SIZE=3 LEN=4 (assuming no page cross)
         pha                   ; SIZE=4 LEN=7
         lda JumpTableH,X      ; SIZE=7 LEN=11 (assuming no page cross)
         pha                   ; SIZE=8 LEN=14
         rts                   ; SIZE=9 LEN=20

Code takes the same time to execute, but is 4 bytes shorter. It also has the huge advantage of not needing any dedicated locations.

Not to start an argument, but both of you gentlemen are suffering from a bit of code myopia. :lol:

If using the 65C02 or 65C816, there is the JMP (<addr>,X) instruction, which requires no preparation of any jump vectors. You use it with a table containing 16 bit addresses in little endian order. The following code does the work:

Code:
         lda #index            ;zero-based routine index
         asl a                 ;double it
         tax                   ;now absolute index
         jmp (table,x)         ;goto routine

   ...

table    .word routine1,routine2,routine3 ... etc.

The above is the assembly language equivalent of ON...GOTO in BASIC. The 65C816 also has JSR (<addr>,X), which would be like ON...GOSUB in BASIC.

Unless working with old hardware, there is absolutely no good reason to constrain oneself with NMOS code limitations.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 4:18 pm 
Offline

Joined: Sat Jul 28, 2012 11:41 am
Posts: 442
Location: Wiesbaden, Germany
Tor wrote:
@litwr: I must admit that I don't understand your example.
That's simply because it doesn't work. Another "lda INDEX" is required, or you use only the first loacation of the tables because of the "and #1".
Code:
   lda INDEX
   and #1
   bne l_odd

   lda INDEX   
   sta m_even+1
m_even
   jmp (even_table)
l_odd
   lda INDEX
   sta m_odd+1
m_odd
   jmp (odd_table)

align 256
odd_table .word ...    ;256 bytes
even_table .word ...   ;256 bytes
On NMOS I favour the RTS method shown by bregalad, on CMOS there is JMP (abs,x) as BigEd mentioned. And it even works without self modifying code.

However, for alterable ROM-vectors in RAM the JMP (abs) has its value, wether you have to allign the vector to an even boundary or not.

If you really want to do it your way and have it working in both worlds:
Code:
   lda INDEX
   asl a
   bcs l_high

   sta m_low+1
m_low
   jmp (low_table)
l_high
   sta m_high+1
m_high
   jmp (high_table)

align 256
low_table .word ...    ;128 address words
high_table .word ...   ;128 address words

_________________
6502 sources on GitHub: https://github.com/Klaus2m5


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 8:43 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8512
Location: Southern California
litwr wrote:
Let's see at the next code
Code:
   ldx #1
   lda $ff,x

What is the location of a byte to load into AC register? Is it $100? No, It is 0. The same as JMP ($FF). Is it bug?
Let's see the other code
Code:
   ldx #0
   lda ($ff,x)

What is the address of a location of a byte to load to AC?  Is it at $ff and $100?  No, it is at $ff and 0.  The same as JMP ($FF).  Is it bug?

LDA $FF,X and LDA ($FF,X) are ZP instructions, meaning that the high byte of the operand is forced to be 0.  JMP (<addr>) is not a ZP instruction.  If you want LDA $FF,X to take the high byte from $100, you have to force the assembler to use an absolute instruction.  How to do that will vary between assemblers.

Thanks BDD for bringing up the c02's JMP (<addr>,X) addressing mode.


Quote:
65816 was a bit too late and can't match 80186 or 68000.

These may have been more practical than the '816 for some things, but the '816 did outperform them, at least in some benchmarks.

Quote:
IMHO x86-64 at 1 Ghz can easily outperforms 6502 at 10 Ghz.  X86_64 can make 64-bit multiply with 128-bit result in 1 tick!  An x86_64 instruction may require dozens of 6502 instructions.  The same is true for modern ARM too.

True.  The application needs to be taken into account too though.  I frequently see one manufacturer or another tout their inexpensive microcontroller's MULtiply function; yet in all the products I've brought to market with embedded microcontrollers (or in the early 90's, a 65c02), except for the gizmo with a 65c02 that I wrote the floating-point routines for in about 1987, none had any use for a multiply instruction.  They also dealt with only 8-bit quantities most of the time, once in a while a 16-bit, and even more rarely a 24-bit.  I have never had a need for a 64x64-bit multiply, even on the workbench for in-house testing, although there have been a few times when 16x16 (with 32-bit result) was barely adequate.  If I could have the 10GHz with 65 simplicity, that'd be super cool.

Quote:
The mentioned table shows that 6809 at 1.79 Mhz is 5 times slower than 6502 at 2 MHz with BBC Basic 4.  The Basic benchmarks are very inaccurate but...  So It is possible that 6502 may outperform 6809 with the same frequency.  :shock:

What table?  I missed it.  I have a hard time believing that the 6809 would ever be slower than the 6502 at a comparable clock rate.  The 6809 was the step up from the 6800 though, so to be fair, the '816 is the step up from the '02, and I think the '816 is a better step up, partly just because of the 16MB address space.

In a way, I wonder why we bother comparing performance on these tiny-league processors when there are, as you say, modern processors that can do a billion 32-bit instructions per second or more; but the answer is that our interest is in this processor whose computing power is quite adequate for a range of applications even with a simple bus structure and instruction set that don't require a computer engineer to understand and design with.  I also believe that even for someone who does work with the major-league computers, it can be a good thing to frequently come back to the little guys to practice squeezing out as much performance as possible while working with their limitations.  That's one reason I'm intrigued by Jonathan Halliday's preemptive multitasking GUI OS for Atari 6502 computers.  Although I have not had any exposure to 6502 Atari computers, I am fascinated by the discipline and care and inventive techniques used to pull this off in a system with such limited memory and execution speed.  It's definitely a good practice, and forces one to be more efficient with the resources.  It's a valuable skill to transfer to any system, including modern ones where bloatware and programming sloppiness are unfortunately the norm.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 18, 2016 9:39 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
GARTHWILSON wrote:
I have a hard time believing that the 6809 would ever be slower than the 6502 at a comparable clock rate.
6809 is hobbled by a wearisome prevalence of dead cycles. Even a simple operation such as an 8-bit load using Absolute address mode takes 5 cycles on 6809, as compared to 4 cycles on 6502. Using Direct-Page/Zero-Page mode the numbers are 4 cycles as compared to 3 cycles.

Hitachi's 6309 eliminated many or all of the dead cycles. And of course the 6809/6309's sophisticated address modes and large complement of 16-bit operations are unequaled by 6502.

Back in the day a pal o' mine remarked to me about the dismal performance of a 6809 machine he had encountered (probably the Tandy Color Computer, although I don't recall for sure). Just as a casual test he had typed an empty DO ... LOOP in Basic then hit RUN. "I couldn't believe how long it took for the loop to complete. I was beginning to think the machine had crashed!" We ended up speculating about the fact that 6809 is source-code compatible with 6800. My hunch was that the machine was running 6800 Basic, unmodified except for having been reassembled for 6809! :cry:

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 19, 2016 12:32 am 
Offline

Joined: Tue Nov 10, 2015 5:46 am
Posts: 228
Location: Kent, UK
Dr Jefyll wrote:
GARTHWILSON wrote:
I have a hard time believing that the 6809 would ever be slower than the 6502 at a comparable clock rate.
6809 is hobbled by a wearisome prevalence of dead cycles. Even a simple operation such as an 8-bit load using Absolute address mode takes 5 cycles on 6809, as compared to 4 cycles on 6502. Using Direct-Page/Zero-Page mode the numbers are 4 cycles as compared to 3 cycles.
Does anyone remember the Williams Defender arcade game from 1980? It was a fast side-scrolling shoot em up, with a mad number of on-screen 16 color sprites. If you're not familiar with it, then try youtube for any one of countless videos of it in action.

I bring it up because what you see on the screen... Every object... Every animated particle... Every part of the scrolling image is generate and updated by a lowly 6809.

There's no hardware scroll register to shift the display. There are no hardware sprite registers.

The whole display is around 30-something K of 4-bit-per-pixel bitmapped graphics.

Every byte of screen memory holds two pixels, and adjacent bytes stride the y axis. That is, address 0000h holds the pixels for screen coordinates (0,0) and (1,0) ... In x,y notation... And address 0001h holds the pixels for screen coordinates (0,1) and (1,1). Thus a sequence of writes to consecutive addresses will result in a 2-pixel vertical stripe on the screen. The Defender code used multi-register PUSH instructions to rapidly write sprite data.

For the longest time I had always assumed that Williams had tons of fancy hardware in their Defender machines... Because all that happening on the screen couldn't possibly be done by a simple 8-bit CPU.

But I was wrong. It's all software. Smart hardware choices (re: the video memory layout) along with expert programming by Eugene Jarvis made the seemingly impossible possible.

It's quite telling that even the Atari ST with its 8MHz 68000 never had a Defender implementation that matched the speed of the 6809 original (and the ST also had a 32K screen).

Point being? The 6809 was a powerful little chip... As I think Williams Defender very well proves.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 19, 2016 2:35 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
sark02 wrote:
The 6809 was a powerful little chip
Oh yes -- absolutely! My post was not terribly well written, as it drifts in several different directions. Let me be clear I'm a big 6809 fan, and my observation about sophisticated address modes and 16-bit operations carries more weight than the point about all the dead bus cycles (which was in response to Garth's remark). Also the 6809 can hardly be blamed for suboptimal results when running 6800 code.

As for JMP (abs), I agree it's a bug but I almost wish they hadn't fixed it. The fixed version is one cycle slower -- and JMP (abs) is part of the performance-critical NEXT routine used by Forth.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 9  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: