6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 11:23 am

All times are UTC




Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9  Next
Author Message
PostPosted: Fri Aug 26, 2016 8:41 am 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
GARTHWILSON wrote:
I just checked what my 1985 Synertek book says about the NMOS '02 though, and it was still not updated to reflect the bug that was found. It was still saying, "The next memory location contains the high-order byte of the effective address which is loaded into the sixteen bits of the program counter." As we know however, it won't be the next one if the address record starts at xxFF.
Yesterday I went through a number of 6502 books (pre-65c02), including the Zaks book (fourth edition), and I couldn't find anything mentioning the issue. They all used drawings and descriptions similar to your Synertek book. In other words, the authors of those books did not expect (xxFF) to behave differently from (xxFE) or (xx00). So, if that behaviour was somewhat hinted at in parts of the MOS documentation, it was certainly not obvious to those authors.
I never had the Leventhal book back then btw.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 26, 2016 11:34 am 
Offline

Joined: Sat Mar 27, 2010 7:50 pm
Posts: 149
Location: Chexbres, VD, Switzerland
Quote:
Never let an indirect address cross a page boundary, as in JMP ($31FF). Although the high-order byte of the indirect address is in the first location of the next page (in this example, memory location 3200₁₆), the CPU will fetch the high-order byte from the first location of the same page (location 3100₁₆ in our example).

Unless you're writing code that must work on both NMOS 6502 and CMOS derivate devices, I see no reason to never to that - exploiting this bug is fine in my opinion, if you are aware about it.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 26, 2016 1:24 pm 
Offline
User avatar

Joined: Sat Dec 07, 2013 4:32 pm
Posts: 246
Location: The Kettle Moraine
Even then, determining which uP it is should be fairly straightforward, and having two suitable routines, or at least an error condition would be the way to go.


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 26, 2016 7:25 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Bregalad wrote:
Except the reason I encourage people not using JMP () is not because of the bug, but because using pha/pha/rts is shorter (and in some cases, faster). I would still use JMP () if for some reason this would be the optimal solution, but in all my cases where I had to use jump tables, rts happened to be a more efficient solution.

In the rare cases where using JMP () was actually a possibility, the vector has *always* sat in zero page for me, making this bug irrelevant.


JMP is more efficient when you set the vector once and jump through it many times, like is used in ROM indirection for wedges and overrides. It also makes the ROM code smaller (a 3 byte JMP ind), putting the vector manipulation code into RAM when user software wants to change the vectors.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 26, 2016 10:30 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8509
Location: Midwestern USA
BigEd wrote:
Never let an indirect address cross a page boundary, as in JMP ($31FF). Although the high-order byte of the indirect address is in the first location of the next page (in this example, memory location 3200₁₆), the CPU will fetch the high-order byte from the first location of the same page (location 3100₁₆ in our example).

That is the exact wording in the second edition. I knew a warning to that effect was in the first edition, but could not recall exactly what it said.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 27, 2016 4:41 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Thanks for checking BDD. If you'd be so kind as to have a quick look at the PDF I linked and confirm that it's an earlier (shorter) edition than the one you have, that would be great.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 27, 2016 4:49 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8509
Location: Midwestern USA
BigEd wrote:
Thanks for checking BDD. If you'd be so kind as to have a quick look at the PDF I linked and confirm that it's an earlier (shorter) edition than the one you have, that would be great.

The PDF wording is identical to that in the actual book (page 3-13).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 27, 2016 5:05 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8545
Location: Southern California
I just checked Leventhal & Saville's book "6502 Assembly-Language Subroutines" (© 1982) and its wording on p.151 is different, but still covers the problem:

Quote:
IMPLEMENTATION ERRORS

Occasionally, a microprocessor's instructions simply do not work the way the designers or anyone else would expect. The 6052 has one implementation error that is, fortunately, quite rare. The instruction JMP ($XXFF) where the Xs represent any page number, does not work correctly. One would expect this instruction to obtain the destination address from memory locations XXFF and (XX+1)00. Instead, it apparently does not increment the more significant byte of the indirect address; it therefore obtains the destination address from memory locations XXFF and XX00. For example, JMP ($1CFF) will jump to the address stored in memory locations 1CFF₁₆ (LSB) and1C00₁₆ (MSB), surely a curious outcome. Most assemblers expect the programmer to ensure that no indirect jumps ever obtain their destination addresses across page boundaries.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 27, 2016 5:25 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8509
Location: Midwestern USA
GARTHWILSON wrote:
I just checked Leventhal & Saville's book "6502 Assembly-Language Subroutines" (© 1982) and its wording on p.151 is different, but still covers the problem...

All the more reason to not use the NMOS parts unless working with old hardware in which there is no alternative.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 27, 2016 10:01 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
BigDumbDinosaur wrote:
BigEd wrote:
Thanks for checking BDD. If you'd be so kind as to have a quick look at the PDF I linked and confirm that it's an earlier (shorter) edition than the one you have, that would be great.

The PDF wording is identical to that in the actual book (page 3-13).

Indeed, I just wanted a double-check that the PDF is indeed an earlier version - if the PDF is a second edition like your book, it doesn't give us the early indication that I think it probably does.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 27, 2016 5:56 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8509
Location: Midwestern USA
BigEd wrote:
BigDumbDinosaur wrote:
BigEd wrote:
Thanks for checking BDD. If you'd be so kind as to have a quick look at the PDF I linked and confirm that it's an earlier (shorter) edition than the one you have, that would be great.

The PDF wording is identical to that in the actual book (page 3-13).

Indeed, I just wanted a double-check that the PDF is indeed an earlier version - if the PDF is a second edition like your book, it doesn't give us the early indication that I think it probably does.

Well, I do recall the warning was in Leventhal's first edition (1979). I can't vouch for the wording, however.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 06, 2016 12:41 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Z-80s can take 3, 4, or 5 clock cycles per M-cycle, depending on the instruction executed, and even then, can take several M-cycles too.

The reason the average is between 3 and 5 is because the chip is microcoded underneath. That is, the Z-80 and 68000 CPUs actually have a simpler, yet functionally independent CPU (of sorts) whose job it is to interpret the programmer-visible instruction set. This is how, for example, the 68020 could get away with such utterly ridiculously complicated addressing modes.

In the Z-80's case, the microcode is used to multiplex data over finite numbers of internal buses, all of which were laid out by hand back then. It's also used to enforce the bus protocol as well:

1. Put address on the bus.
2. (While waiting for data to arrive, increment the PC.)
3w. Sample the RDY or similar signal, and if negated, wait here.
3. When RDY is asserted, accept the data and terminate the bus transaction.

E.g., if RDY is asserted during step 1 above, it means nothing.

The genius of the 6502 is that its bus was truly single phase (phi1 and phi2 are conveniences to simplify the NMOS implementation; it's not at all a requirement for the CMOS process, which is why the 65816 doesn't have them, and you drive phi2 externally). You drive the address bus and R/W lines during phase-1, and capture the data during phase-2. If the data wasn't ready, that's OK -- just repeat the current cycle. The 65816 is only marginally more sophisticated than this, due to multiplexing the bank-address byte on D7-D0.

As far as proof of 6502's performance relative to other CPUs with a normalized clock, all you need to do is count the cycles in your particular application. Yes, yes, the 68000 can pull off 2 MIPS performance at 8MHz, but know what? A 65816 at 8MHz will pull off 4 MIPS on average. At this point, the 65816 and 68000 compete head to head, with the former on average attaining close to 80% the performance of the 68000, despite having only an 8-bit wide bus. Proof: Sit an Apple IIgs on a table next to a classic Macintosh. Run a paint program on both of them. (Remember, Mac is monochrome, while IIgs is 16-color. While the Mac has more pixels, the IIgs actually has more bits to push around total). You'd totally expect the IIgs at 2.3MHz to be slower at video updates than the 8MHz Mac; however, this is not the case. Grab a brush from some picture and drag it around the screen. The Mac will have very noticeable rip and tear, while the IIgs will appear to be about as fast as a Commodore-Amiga using its blitter to draw.

As a final analysis, let's normalize bus widths and optimize our microarchitectures too (in fact, we have an existence proof: the 68008), and what you'll find is that the 68000 is abysmally sluggish compared to the 65816. The only reason the 68000 gets the performance that it does is because it has a true 16-bit wide bus. Slap a 16-bit wide bus on the 65816, changing nothing else, and I'm willing to put money that the 65816 will meet or even slightly exceed the 68000.

If we take this opportunity to really widen the data bus, then a single instruction fetch can grab a whole handful of instructions. This is quite useful thanks to something called macro-op fusion. If you augment the instruction decode logic to perform "macro-op fusion", your instruction timings now will look like this:

Code:
; Assuming native-mode, 16-bit accumulator
; For core that implements macro-op fusion, I further assume a 64-bit bus.
;
; CODE                  AS-IS           Macro-Op Fusion
  CLC           ;       2               1       [1, 3]
  LDA addend1L  ;       5               1       [2, 5]
  ADC addend2L  ;       5               1       [2, 5]

  STA resultL   ;       5               2       [3, 4, 5]
  LDA addend1H  ;       5               1       [2, 5]

  ADC addend2H  ;       5               2       [2, 3, 5]
  STA resultH   ;       5               1       [4, 5]

;               TOTAL   32 cycles       9 cycles (best case)

Notes:
 1  Out of context, CLC normally would take the usual 2 cycles;
but, since it's recognized as part of a more complex code pattern,
it's behavior can be rolled into the mechanisations of the surrounding
code.

 2  This instruction takes 2 cycles to fetch a 16-bit word from memory.

 3  There is an additional cycle overhead for instruction fetch on this byte.

 4  This instruction takes 2 cycles to store a 16-bit word to memory.

 5  Add 1 cycle if 16-bit operand crosses an 8-byte boundary.


The CPU is now looking not just at individual instructions to determine what to do, but the context surrounding them. clc, lda, adc is a single "macro-op" instruction. sta, lda occurs entirely too frequently to miss this one too. adc, sta occurs less frequently, but it's strongly desirable for what I hope are obvious reasons.

According to http://oldwww.nvg.ntnu.no/amiga/MC680x0 ... ndard.HTML , a single ADD.L instruction takes 12 cycles. The code above fetches, adds, and stores a 32-bit quantity to memory, and assuming alignment with respect to the 64-bit bus, actually runs 3 cycles faster. Again, this is a hypothetical case, and don't expect to see this technology become widespread in the 65xx-community soon. All I'm saying is that it's relatively easily doable if you truly compare apples to apples.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 06, 2016 9:12 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
[placeholder - with luck, we can discuss points raised by Sam's post over here.)]


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 14, 2016 7:58 am 
Offline

Joined: Wed Oct 22, 2003 4:07 am
Posts: 51
Location: Norway
kc5tja wrote:
As far as proof of 6502's performance relative to other CPUs with a normalized clock, all you need to do is count the cycles in your particular application. Yes, yes, the 68000 can pull off 2 MIPS performance at 8MHz, but know what? A 65816 at 8MHz will pull off 4 MIPS on average. At this point, the 65816 and 68000 compete head to head, with the former on average attaining close to 80% the performance of the 68000, despite having only an 8-bit wide bus. Proof: Sit an Apple IIgs on a table next to a classic Macintosh. Run a paint program on both of them. (Remember, Mac is monochrome, while IIgs is 16-color. While the Mac has more pixels, the IIgs actually has more bits to push around total). You'd totally expect the IIgs at 2.3MHz to be slower at video updates than the 8MHz Mac; however, this is not the case. Grab a brush from some picture and drag it around the screen. The Mac will have very noticeable rip and tear, while the IIgs will appear to be about as fast as a Commodore-Amiga using its blitter to draw.

As a final analysis, let's normalize bus widths and optimize our microarchitectures too (in fact, we have an existence proof: the 68008), and what you'll find is that the 68000 is abysmally sluggish compared to the 65816. The only reason the 68000 gets the performance that it does is because it has a true 16-bit wide bus. Slap a 16-bit wide bus on the 65816, changing nothing else, and I'm willing to put money that the 65816 will meet or even slightly exceed the 68000.

You ignore memory speeds in your comparison. A 8Mhz 65816 requires 8Mhz ram, every cycle a new memory access is made. A 8Mhz 68000 requires 2Mhz ram for full speed operation, every memory access takes 4 cycles. The 68000 only has half the memory bandwidth of the 65816. The 68008 only has one quarter the memory bandwidth of 65816.

You can also use even slower memory with the 68000, memory bound instructions will slow down, but instructions that are not, like a series of muls, will still run full speed.


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 14, 2016 8:08 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I agree - for historical performance comparisons, you have to normalise memory speeds. Memory was a major cost in a system as well as often being the performance limiter. It was a long time in the micro world before caches were seen - first off chip and then on chip.

The Z80 tactics, of clocking the CPU faster than memory, backfired here, because as memory sped up, the CPU would need to go uncomfortably fast to avoid being the bottleneck. Modern Z80s don't have that 4:1 ratio. Meantime, the 6502 tactics meant it was possible to run video or DMA in the half-cycles that the CPU didn't need the memory, because the memory was faster than the CPU. Later, we could have 4MHz 6502 in a system which didn't need to make the memory do double-duty. (Acorn's Master Turbo had a second 6502 at 4MHz in 1986.)

Edit: of course, different considerations apply today. You can buy WDC parts which run at 14MHz and then you have to find appropriate memory and peripherals. If you use an FPGA for your CPU you can run at 100MHz and more, you have on-chip RAM which can probably run at full speed but it's synchronous. To use off chip RAM you're back in the territory of choosing between simple SRAM which runs slower than your CPU, or something which runs faster but is best used for short sequential transfers.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 130 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: