Have any arbitrary-precision arithmetic implementations been written for the 6502? It offers the greatest numerical precision and stability, but at significant performance costs.
https://en.wikipedia.org/wiki/Arbitrary ... arithmetic
Let's Talk About Math Baby
Re: Let's Talk About Math Baby
Martin_H wrote:
Have any arbitrary-precision arithmetic implementations been written for the 6502? It offers the greatest numerical precision and stability, but at significant performance costs.
https://en.wikipedia.org/wiki/Arbitrary ... arithmetic
https://en.wikipedia.org/wiki/Arbitrary ... arithmetic
But also consider just how many significant digits you actually need?
I have a BCPL Mandlbrot program that has the following comment:
Code: Select all
This program plots a selected region of the Mandelbrot set using
arbitrary high precision arithmetic. Currently it uses numbers with 48
decimal digits after the decimal point, so it can accurately plot
regions much smaller than a proton assuming one corresponds to a
metre.
https://www.jpl.nasa.gov/edu/news/2016/ ... ally-need/
finally
Code: Select all
Captain James T. Kirk: What would you say the odds are on our getting out of here?
Mr. Spock: Difficult to be precise, Captain. I should say, approximately 7,824.7 to 1.
Captain James T. Kirk: Difficult to be precise? 7,824 to 1?
Mr. Spock: 7,824.7 to 1.
Captain James T. Kirk: That's a pretty close approximation.
Mr. Spock: I endeavor to be accurate.
Captain James T. Kirk: You do quite well.
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Let's Talk About Math Baby
I haven’t seen an arbitrary-precision package for any member of the 6502 family. However, there is Marco Granati’s floating-point implementation for the 65C816 (see attached, was written for the WDC assembler). Macro’s code works with quad-precision, IEEE-754 numbers and appears to be pretty complete—many transcendental functions are supported. Disclaimer: I have not tested it, so I can’t vouch for its speed or accuracy.
A long-term project of mine will be to translate Marco’s source so it can be assembled in the Kowalski assembler. In the process, I may look at arranging the source so it can be used as a relocatable library in other 65C816 projects. I don’t have anything planned for its usage at this time, but one never knows.
Speaking of Marco, he hasn’t visited here in several years. He lives in Italy and his last log-on to the forum was during the thick of the COVID mess. Many Italian cities were especially hard-hit. I’ve often wondered if he was a victim. If so, not knowing again highlights a general problem we have often discussed in the past. We have no means of knowing the status of our members should something unfortunate befall one of them and no one in their family is in contact with any of us.
A long-term project of mine will be to translate Marco’s source so it can be assembled in the Kowalski assembler. In the process, I may look at arranging the source so it can be used as a relocatable library in other 65C816 projects. I don’t have anything planned for its usage at this time, but one never knows.
Speaking of Marco, he hasn’t visited here in several years. He lives in Italy and his last log-on to the forum was during the thick of the COVID mess. Many Italian cities were especially hard-hit. I’ve often wondered if he was a victim. If so, not knowing again highlights a general problem we have often discussed in the past. We have no means of knowing the status of our members should something unfortunate befall one of them and no one in their family is in contact with any of us.
x86? We ain't got no x86. We don't NEED no stinking x86!
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Let's Talk About Math Baby
drogon wrote:
Code: Select all
Captain James T. Kirk: What would you say the odds are on our getting out of here?
Mr. Spock: Difficult to be precise, Captain. I should say, approximately 7,824.7 to 1.
Captain James T. Kirk: Difficult to be precise? 7,824 to 1?
Mr. Spock: 7,824.7 to 1.
Captain James T. Kirk: That's a pretty close approximation.
Mr. Spock: I endeavor to be accurate.
Captain James T. Kirk: You do quite well.Good ole Spock!
x86? We ain't got no x86. We don't NEED no stinking x86!
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Let's Talk About Math Baby
BTW, forgot to mention there is a topic on Granati’s floating-point package.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Let's Talk About Math Baby
cjs wrote:
wayfarer wrote:
NZQRC
Integer, Signed, Rational, High-Precision 'point' numbers (or Scientific Notation (OR|| Symbolic Real Numbers)), Complex Numbers
Integer, Signed, Rational, High-Precision 'point' numbers (or Scientific Notation (OR|| Symbolic Real Numbers)), Complex Numbers
https://www.google.com/search?q=NZQRC&r ... e&ie=UTF-8
and are accurately represented by e^(iπ)= -1
(Euler's Identity or variations thereof)
(not sure how to get superscript working here)
This is pretty fundamental to my understanding of maths in general. Perhaps some terms are different. I know when I took Differential Equations I bought some software for tutoring. In every case, it told me the problem 'backwards' from how it was presented in class. I later learned that are 'two schools' of thought on this or 'two camps' and different US universities will adopt one or the other. I took Diff EQ at Mizzou. Go Tigers!
establishing jargon/lingo is important.
to me, ANY good math library will offer the ability to cleanly represent these types of numbers and expressions cleanly.
Quote:
In math, integers are signed as they go on infinitely in both directions along the number line (except when using modular arithmetic).
ℕ is the "natural" numbers, which have a starting point and go infinitely in one direction. . These may start at 0 or 1, if you're counting them using numerals, or might be counted without numerals at all, as in Peano numerals (Z, S(Z), S(S(Z)), ...) or Church encoding.
ℕ is the "natural" numbers, which have a starting point and go infinitely in one direction. . These may start at 0 or 1, if you're counting them using numerals, or might be counted without numerals at all, as in Peano numerals (Z, S(Z), S(S(Z)), ...) or Church encoding.
Quote:
ℝ are the "real" numbers. These do not have inherent precision; the precision, and whether or not you use "scientific notation" (more on this below) are artifacts of your implementation.
Actually, it's simpler to use a signed integer for just one of the numerator or denominator (typically the numerator). If both are signed you then have to deal with calculating the sign based on the two signs in your representation: +/+ and -/- are positive, and +/- and -/+ are negative.
Quote:
...the 'Q' numbers are rational and the numerator and denominator should ideally be any other type of number however for simplicity I would say use either an Integer or Signed Int for both...
Quote:
Quote:
...and then you have scientific notation, which while basically a 'float', it is a fixed point number, with an exponent, and these can then be operated on in ways that might be faster than traditional floating point math...
the ability to construct that needs to be in place in any low level library, Scientific Notation should be 'trivial' to implement if all other aspects are in place, ie, floating or fixed point numbers, exponents, operands and multi term 'numbers'.
Quote:
Quote:
...however I think as you move into more symbolic maths, the limitations of storing this number is more obvious; moving to a system where representation is not 'fixed until needed' and 'always exactly predictable to a given place or level of precision' so if you want Pi, you calculate it out from a Unit circle or some other convenient meter...
Quote:
You may also find it useful to look at how numeric towers and number systems are dealt with in functional programming languages, which tend to be a little more rigorous about how they set these things up. The Spire package for Scala is a simpler example, and if you want to get really serious you can look at the various numeric towers available for Haskell, which usually work up to the various sets of operations from basic algebraic structures such as monoids, semigroups, and so on and are very careful (due to the nature of the language) about things like partial functions (such as division). The Numbers section of the Prelude is what most people use, but NumHask looks like a good alternative, though I've not investigated it.
C, C#, Web(HTML/CSS/PHP/MySQL), Python.. some Java, couple others here and there, BASIC a long time ago
learning Prolog, might pick up something like Forth, I dip my foot in ECMAscript
I have to do a lot of Math, and since I am learning 65xx ASM, its a good place to work on these notions
I might grab some Functional Programming at some point, I have a lot on my plate right now. However, for a 65xx Maths library, making it 'support functional languages' and 'use symbolic maths' are kinda the same thing a little. A variable can be a function can be a term.
So Id like to aim for that...
As other mentioned 'arbitrary precision', a lot of why I am doing this is to better understand how that might be accomplished.
How to work with 'any sized number' and declare that precision during runtime.
Re: Let's Talk About Math Baby
wayfarer wrote:
NZQRC are the 5 types of 'algebraic' numbers, as /i listed in my earlier post, they have historical use and are widely accepted....
wayfarer wrote:
NZQRC Integer, Signed, Rational, High-Precision 'point' numbers (or Scientific Notation (OR|| Symbolic Real Numbers))....
And of course you need to remember that the terminology used for number representations in computers is often different, which may be where some of your confusion between mathematical natural numbers and integers is coming from. When talking about number representations in computer languages, in most contexts an "integer" is mathematically neither a natural number nor an integer, but a member of a modular arithmetic residue class.
So it's perfectly fine to call a 16-bit value holding a limited range of values from ℕ or ℤ an "integer," but it's not correct to say that it's of type ℕ or ℤ. (Calling it a "signed integer" or "unsigned integer" makes it clear that you're using the computing term, not the mathematical term, because "signed integer" would be redundant when describing ℤ and "unsigned integer" is not ℤ.)
Oh, and algebraic numbers are also a very different thing in standard math from the way you're using them.
And why do I go on about this? Because I agree with you that:
wayfarer wrote:
establishing jargon/lingo is important.
It's worth nothing that in some situations the differences between the mathematical representations and the computer representations are even greater. For example, in 6502 assembly language an 8 bit value is not inherently signed or unsigned, and even some of the functions you use on it (`ADC`, `SBC`) also are also not signed or unsigned. The signed unsigned interpretation happens not in the byte itself, or when doing an `ADC` on the byte, but _when you check the flags._ Up to that point it's essentially both at the same time.
Quote:
and are accurately represented by e^(iπ)= -1 (Euler's Identity or variations thereof) (not sure how to get superscript working here)
Quote:
I am really leaning to a symbolic maths/computer algebraic solver and want to say (X+Q) is a numerator. Or a denominator, so this would be in a way, a 'functional programming' idea to my understanding. IE, 'any term can be the output of an entire process'. That 'variable' can be entire programs in a Unix like environment with system calls and piping.
Code: Select all
add1 x = 1 + x -- function definition
apply f x = f x -- function to apply a function to one argument
y = apply add1 3 -- y = 4, but same as "y = add1 3"
add2 x = (+) 2 x -- using (+) function in prefix form
add3 = (+) 3 -- "point free" style:
-- • "(+)" is a function that takes two arguments.
-- • "(+) 1" is a partial application, evaluating to a
-- function that takes one argument.
-- • The assignment makes add3 the same function that
-- takes one argument.
Lisp works similarly, and might be an easier place to start on this sort of thing. Also, it is probably a lot easier to write a Lisp for 6502 than Haskell, if you wanted to go that direction.
Quote:
I think this is 'inline functions' as a math term or such and I would like to have the basis for this myself... never calculating a value and operating on logic and rules, only generating the actual 'digits' when required at run time when needed.
I don't know if there's a particular term of what you seem to be referring to, but I'd call it just "doing algebraic math, until you decide to do the calculations."
Quote:
at some level a data structure or Struct, should something like __SCINUM (operand)*10^(exponent)
the ability to construct that needs to be in place in any low level library, Scientific Notation should be 'trivial' to implement if all other aspects are in place, ie, floating or fixed point numbers, exponents, operands and multi term 'numbers'.
the ability to construct that needs to be in place in any low level library, Scientific Notation should be 'trivial' to implement if all other aspects are in place, ie, floating or fixed point numbers, exponents, operands and multi term 'numbers'.
And again, scientific notation is orthogonal to this; you do not need it for floating point. A parser reading a floating point constant will produce the same value whether it reads "0.0000012" or "1.2e-6", the same is true of "1200000000000000000000" or "1.2e21".
Quote:
I might grab some Functional Programming at some point, I have a lot on my plate right now. However, for a 65xx Maths library, making it 'support functional languages' and 'use symbolic maths' are kinda the same thing a little. A variable can be a function can be a term.
Quote:
As other mentioned 'arbitrary precision', a lot of why I am doing this is to better understand how that might be accomplished.
How to work with 'any sized number' and declare that precision during runtime.
How to work with 'any sized number' and declare that precision during runtime.
A while back I did a quick experiment with this, writing routine to read a decimal number and convert it to a bigint up to around a hundred-odd bytes long. (That limit is due to the 255 length limit on the input string; routines manipulating native encodings are good up to 255 bytes.) Reading a decimal number is not only handy in and of itself, but also brings in the first native operation you need: a multiply by ten. You can find the code here and the tests here.
Curt J. Sampson - github.com/0cjs
- Sheep64
- In Memoriam
- Posts: 311
- Joined: 11 Aug 2020
- Location: A magnetic field
Re: Let's Talk About Math Baby
I wondered where this topic would go, if anywhere. We've already considered multiplication, squaring, more multiplication, Maclaurin, Taylor, Newton-Raphson and CORDIC. I hoped that we might discuss matrix operations, neural networks, a hypothetical Commodore FPU or a polygon blitter which uses the same floating point format as BASIC, Forth and C. Like many others, I was particularly impressed by TobyLobster's exhaustive evaluation of more than 60 multiplication algorithms. I hoped there would be a definitive winner but "it depends" because the distribution of inputs affects the choice of fastest algorithm.
I've previously noted that ℤ ⊂ float ⊂ ℚ ⊂ ℝ. Floating point is a subset of fractions not a superset. Regardless, programmers struggle to understand that binary floating point mangles almost everything. This includes 1/3, 1/5 and everything derived from them. Most notably this includes 1/10, 1/100 and 1/1000. Our most noble, wise and symmetrical tri-quad-overlord, cjs, suggests Base 12 or Base 60 to avoid rounding error. Indeed, given my understanding of Chinese Remainder Theorem, my first thought was Base 360 encoding to hold 1/10 and 1/12 without error. I then downgraded that to Base 180 to fit into one byte and raised it to Base 240 to obtain better encoding density. So, yes, I strongly agree with cjs that Base 60 is a good encoding and - after multiple adjustments - humbly note that it is possible to hold two extra bits per byte, your most quadness. If you are in an obscure case where 1/7 is hugely important (in addition to 1/3 and 1/10), Base 210 should be considered. Otherwise, Base 240 is a good encoding which uses most of the available bit permutations.
Two's compliment is preferable for addition, one's compliment is preferable for multiplication, either is moot for Taylor approximation but monotonic representation is preferable for O(n log n) sort operations which may overwhelm all other considerations. Indeed, I strongly recommend that Base 240 representation follows binary sort order, irrespective of the representation having dis-contiguous holes. There is no strict requirement for Base 240 representation to use the first, middle or last values of each byte. However, the last values may be preferable for software and hardware implementation; especially hardware counters. De-normalized Base 240 representation is highly suited to polygon blitting where modulo 15*16 counters may increment faster than 8 bit counter. Likewise, multiples of 240 are entirely compatible with high resolutions. Specifically, video resolution is often specified as an integer multiple of 2K (1920*1080 pixels). This is 8*4.5 multiples of 240 pixels.
One of the over-looked advantages of BCD representation and larger representations is that very few exponent bits are required to cover a broad range. With Base 2 representation, every successive value of exponent is a doubling. With Base 10 representation, every successive value of exponent is an order of magnitude. Therefore, 6 bit exponent covers 64 leading decimal digits - and slightly more if a de-normalized form is allowed. Base 240 encoding with 6 bit exponent covers more than 152 leading decimal digits. If we follow the development of floating point from the Apollo Guidance Computer to MIL-STD-1750A to the laughable "Chinese wall" conflict of interest of IEEE-754 development, we find that range and precision is always compatible with common aeronautical formula in the volume of space covering Earth and Moon. From this, we find there is never a reason to take the value 10^50 and cube it. Regardless, I find it ludicrous that it is possible to take a IEEE-754 single precision value, cast it to double precision, square it and obtain overflow. I also find 80 bit representation unwieldy. Therefore, I strongly recommend Base 240, 32 bit single precision with 6 bit exponent and 64 bit double precision with 14 bit exponent.
I have thought further about the interaction of a caching multiplication algorithm and interrupts. Specifically, this corner case should be avoided at all costs. One of the many barriers to general purpose operating system adoption on 6502 is that implementations (typically written in C) assume that multiplication is a computationally cheap operation which can be used with abandon within interrupt. Even a slow microcontroller, such as AVR ATMEGA328P, can perform 16 bit integer multiplication within 200ns - and operating system development typically targets hardware which can amortize 32 bit integer multiplication within 1ns by using register re-naming. 300 cycles or more at 40ns or more is considered perverse and 6502 exists in a niche where it is bad at multiplication and therefore doesn't get used for any purpose which requires intensive multiplication. Historically, this was false. Many early 6502 systems were purchased for the primary purpose of running spreadsheet software. Likewise, it does not remain entirely false. 6502 systems with SRAM offer considerable advantage with reliability, maintainability and energy consumption. A canonical example is GARTHWILSON's flight computer; a BCD 6502 system which reduced processor speed to save energy. It is possible that people using alternative systems have died in light aircraft crashes.
GARTHWILSON recommends against BCD due to the related problems of poor speed and poor encoding density. Indeed, Base 100 and Base 1000 offer advantages over BCD. I believe that Atari BASIC uses BCD for floating point and I presume that rationale is quite simple. Performance will be awful with any representation. However, decimal means trivial string conversion and no errors with decimal currency. We should remember that Nolan Bushnell, founder of Atari, said "Business is a good game - lots of competition and a minimum of rules. You keep score with money." While Atari failed to count pennies prior to bankruptcy, we should also remember that the world's former richest man didn't bother with decimal floats. Or accuracy.
Regarding the interaction of cached multiplication and matrix operations, it is apparent to me that the order of operations is significant. As an example, two dimensional matrix rotation typically uses three distinct values. However, poor evaluation order and unfortunate hashing may cause the same inputs to be evaluated repeatedly. This case can be entirely avoided by modifying the schedule of multiplication operations. In the general case, serpentine evaluation might be most efficient. For small matrix rotation operations, diagonal evaluation is recommended. Obviously, this requires the type of rigorous testing pioneered by TobyLobster. However, in the case of sparse matrix operations (magnify, project, rotate, shear), multiplication by zero is exceedingly common.
Regarding a Commodore FPU, a hypothetical 6587 required a patient and charitable understanding which didn't exist in 1980s computing. Although NASA believed that it was fruitful for 6502 to manage up to 10 FPUs, a dedicated 65xx peripheral chip would have been underwhelming. However, don't let that stop you from making similar with FPGA. Intel's 8087 was a "show-boating", ambitious over-reach. While 8086 initially ran at 5MHz, any 8087 dragged it down to 3MHz. It was a hugely ornate chip with 80 bit precision and the majority of its operations are completely useless; most notably all of the hyperbolic and arc-hyperbolic functions. It required two bits per cell to hold the micro-code. Yes, the micro-code was held in Base 4. Obviously, this is why manufacturing yield was poor and devices ran slowly. Commodore could have competed against this insanity with the minimal implementation. Specifically, addition, subtraction, multiplication and not much more. However, it would have been viewed less favorably than National Semiconductor's awful FPU.
A hypothetical 65xx FPU does not require many pins. It requires power, ground, 8 data lines, some address lines, clock, read/write and it is customary for 65xx peripheral chips to have multiple select lines with differing polarity. Alternatively, use of SYNC allows co-processor to share the instruction stream. At most, this would require 20 pin narrow DIP. However, this assumes a contemporary level of chip packaging for a large chip. It also ignores heat dissipation of NMOS. 6587 would have been 24 pin ceramic. With suitable pin arrangement, one or more memory mapped 6587 chips could have been placed next to SID. However, it is unlikely that 6587 would have been given more than 16 or 32 memory locations. This is inadequate for 64 bit matrix operations of any size. To reduce register pressure, 6587 may have used 40 bit representation. With a strategically place NC pin, this would have indicated future expansion to 80 bit representation. Given that Commodore had rights to Microsoft BASIC Version 2, it would be quite easy to make BASIC work with FPU. Indeed, it remains worthwhile to pursue this huge performance boost. It also reduces the size of the BASIC interpreter. This would reduce or eliminate the performance loss when running BASIC on Commodore 16 or similar. Either way, it remains worthwhile to maintain the addresses of historical entry points.
I've previously suggested to Rob Finch that VIC-II could have been extended with wider video bus. In this scenario, 320*200 text/bitmap/sprite system is retained as an overlay. Meanwhile, 640*400 or more can be pursued with minimal compatibility concerns. (Within this arrangement, I've found that 40*25 tile video benefits from 640*400 pixel resolution.) The migration path is a variation of Atari's upwardly compatible video architecture. However, this path allows SuperCPU to retain compatibility with the notoriously tight Commodore video timing while also allowing floating point compatibility across BASIC, FPU and polygon blitter. As we've seen from the success of the MEGA65 project, Commodore's VIC-III matches throughput of Amiga OCS graphics. A hypothetical VIC-IV could greatly exceed it, although it might be a 64 pin ceramic monster. When the legacy text/bitmap/sprite system is not required, surplus bandwidth on the 8 bit processor bus can be used to feed a polygon blitter. My doodles indicate that a minimal polygon blitter might take 192 clock cycles or so before the first byte is output. After that, each write may occur every 8, 16 or 32 cycles. However, such a design benefits from several iterations of Moore's law. It also benefits from wider bus. Therefore, it is trivial to double throughput every two years; possibly for a decade or more. At this point, all ROMs can be reduced to one chip, all peripherals can be reduced to one chip and the design is focused around the wider bus of the graphic system.
When the majority of transistors are allocated to a single core GPU, further doublings are possible by scaling GPU cores. In this arrangement, it isn't hugely important if the floating point format is 32 bit or 40 bit, Microsoft/Apple/Intel format, Base 240 or Base 256. Historically, Commodore followed Microsoft in preference to Intel. (See Vintage Computer Festival: Jack and the Machine for rationale.) However, a fresh, consistent implementation should investigate Base 240 representation.
I've previously noted that ℤ ⊂ float ⊂ ℚ ⊂ ℝ. Floating point is a subset of fractions not a superset. Regardless, programmers struggle to understand that binary floating point mangles almost everything. This includes 1/3, 1/5 and everything derived from them. Most notably this includes 1/10, 1/100 and 1/1000. Our most noble, wise and symmetrical tri-quad-overlord, cjs, suggests Base 12 or Base 60 to avoid rounding error. Indeed, given my understanding of Chinese Remainder Theorem, my first thought was Base 360 encoding to hold 1/10 and 1/12 without error. I then downgraded that to Base 180 to fit into one byte and raised it to Base 240 to obtain better encoding density. So, yes, I strongly agree with cjs that Base 60 is a good encoding and - after multiple adjustments - humbly note that it is possible to hold two extra bits per byte, your most quadness. If you are in an obscure case where 1/7 is hugely important (in addition to 1/3 and 1/10), Base 210 should be considered. Otherwise, Base 240 is a good encoding which uses most of the available bit permutations.
Two's compliment is preferable for addition, one's compliment is preferable for multiplication, either is moot for Taylor approximation but monotonic representation is preferable for O(n log n) sort operations which may overwhelm all other considerations. Indeed, I strongly recommend that Base 240 representation follows binary sort order, irrespective of the representation having dis-contiguous holes. There is no strict requirement for Base 240 representation to use the first, middle or last values of each byte. However, the last values may be preferable for software and hardware implementation; especially hardware counters. De-normalized Base 240 representation is highly suited to polygon blitting where modulo 15*16 counters may increment faster than 8 bit counter. Likewise, multiples of 240 are entirely compatible with high resolutions. Specifically, video resolution is often specified as an integer multiple of 2K (1920*1080 pixels). This is 8*4.5 multiples of 240 pixels.
One of the over-looked advantages of BCD representation and larger representations is that very few exponent bits are required to cover a broad range. With Base 2 representation, every successive value of exponent is a doubling. With Base 10 representation, every successive value of exponent is an order of magnitude. Therefore, 6 bit exponent covers 64 leading decimal digits - and slightly more if a de-normalized form is allowed. Base 240 encoding with 6 bit exponent covers more than 152 leading decimal digits. If we follow the development of floating point from the Apollo Guidance Computer to MIL-STD-1750A to the laughable "Chinese wall" conflict of interest of IEEE-754 development, we find that range and precision is always compatible with common aeronautical formula in the volume of space covering Earth and Moon. From this, we find there is never a reason to take the value 10^50 and cube it. Regardless, I find it ludicrous that it is possible to take a IEEE-754 single precision value, cast it to double precision, square it and obtain overflow. I also find 80 bit representation unwieldy. Therefore, I strongly recommend Base 240, 32 bit single precision with 6 bit exponent and 64 bit double precision with 14 bit exponent.
I have thought further about the interaction of a caching multiplication algorithm and interrupts. Specifically, this corner case should be avoided at all costs. One of the many barriers to general purpose operating system adoption on 6502 is that implementations (typically written in C) assume that multiplication is a computationally cheap operation which can be used with abandon within interrupt. Even a slow microcontroller, such as AVR ATMEGA328P, can perform 16 bit integer multiplication within 200ns - and operating system development typically targets hardware which can amortize 32 bit integer multiplication within 1ns by using register re-naming. 300 cycles or more at 40ns or more is considered perverse and 6502 exists in a niche where it is bad at multiplication and therefore doesn't get used for any purpose which requires intensive multiplication. Historically, this was false. Many early 6502 systems were purchased for the primary purpose of running spreadsheet software. Likewise, it does not remain entirely false. 6502 systems with SRAM offer considerable advantage with reliability, maintainability and energy consumption. A canonical example is GARTHWILSON's flight computer; a BCD 6502 system which reduced processor speed to save energy. It is possible that people using alternative systems have died in light aircraft crashes.
GARTHWILSON recommends against BCD due to the related problems of poor speed and poor encoding density. Indeed, Base 100 and Base 1000 offer advantages over BCD. I believe that Atari BASIC uses BCD for floating point and I presume that rationale is quite simple. Performance will be awful with any representation. However, decimal means trivial string conversion and no errors with decimal currency. We should remember that Nolan Bushnell, founder of Atari, said "Business is a good game - lots of competition and a minimum of rules. You keep score with money." While Atari failed to count pennies prior to bankruptcy, we should also remember that the world's former richest man didn't bother with decimal floats. Or accuracy.
Regarding the interaction of cached multiplication and matrix operations, it is apparent to me that the order of operations is significant. As an example, two dimensional matrix rotation typically uses three distinct values. However, poor evaluation order and unfortunate hashing may cause the same inputs to be evaluated repeatedly. This case can be entirely avoided by modifying the schedule of multiplication operations. In the general case, serpentine evaluation might be most efficient. For small matrix rotation operations, diagonal evaluation is recommended. Obviously, this requires the type of rigorous testing pioneered by TobyLobster. However, in the case of sparse matrix operations (magnify, project, rotate, shear), multiplication by zero is exceedingly common.
Regarding a Commodore FPU, a hypothetical 6587 required a patient and charitable understanding which didn't exist in 1980s computing. Although NASA believed that it was fruitful for 6502 to manage up to 10 FPUs, a dedicated 65xx peripheral chip would have been underwhelming. However, don't let that stop you from making similar with FPGA. Intel's 8087 was a "show-boating", ambitious over-reach. While 8086 initially ran at 5MHz, any 8087 dragged it down to 3MHz. It was a hugely ornate chip with 80 bit precision and the majority of its operations are completely useless; most notably all of the hyperbolic and arc-hyperbolic functions. It required two bits per cell to hold the micro-code. Yes, the micro-code was held in Base 4. Obviously, this is why manufacturing yield was poor and devices ran slowly. Commodore could have competed against this insanity with the minimal implementation. Specifically, addition, subtraction, multiplication and not much more. However, it would have been viewed less favorably than National Semiconductor's awful FPU.
A hypothetical 65xx FPU does not require many pins. It requires power, ground, 8 data lines, some address lines, clock, read/write and it is customary for 65xx peripheral chips to have multiple select lines with differing polarity. Alternatively, use of SYNC allows co-processor to share the instruction stream. At most, this would require 20 pin narrow DIP. However, this assumes a contemporary level of chip packaging for a large chip. It also ignores heat dissipation of NMOS. 6587 would have been 24 pin ceramic. With suitable pin arrangement, one or more memory mapped 6587 chips could have been placed next to SID. However, it is unlikely that 6587 would have been given more than 16 or 32 memory locations. This is inadequate for 64 bit matrix operations of any size. To reduce register pressure, 6587 may have used 40 bit representation. With a strategically place NC pin, this would have indicated future expansion to 80 bit representation. Given that Commodore had rights to Microsoft BASIC Version 2, it would be quite easy to make BASIC work with FPU. Indeed, it remains worthwhile to pursue this huge performance boost. It also reduces the size of the BASIC interpreter. This would reduce or eliminate the performance loss when running BASIC on Commodore 16 or similar. Either way, it remains worthwhile to maintain the addresses of historical entry points.
I've previously suggested to Rob Finch that VIC-II could have been extended with wider video bus. In this scenario, 320*200 text/bitmap/sprite system is retained as an overlay. Meanwhile, 640*400 or more can be pursued with minimal compatibility concerns. (Within this arrangement, I've found that 40*25 tile video benefits from 640*400 pixel resolution.) The migration path is a variation of Atari's upwardly compatible video architecture. However, this path allows SuperCPU to retain compatibility with the notoriously tight Commodore video timing while also allowing floating point compatibility across BASIC, FPU and polygon blitter. As we've seen from the success of the MEGA65 project, Commodore's VIC-III matches throughput of Amiga OCS graphics. A hypothetical VIC-IV could greatly exceed it, although it might be a 64 pin ceramic monster. When the legacy text/bitmap/sprite system is not required, surplus bandwidth on the 8 bit processor bus can be used to feed a polygon blitter. My doodles indicate that a minimal polygon blitter might take 192 clock cycles or so before the first byte is output. After that, each write may occur every 8, 16 or 32 cycles. However, such a design benefits from several iterations of Moore's law. It also benefits from wider bus. Therefore, it is trivial to double throughput every two years; possibly for a decade or more. At this point, all ROMs can be reduced to one chip, all peripherals can be reduced to one chip and the design is focused around the wider bus of the graphic system.
When the majority of transistors are allocated to a single core GPU, further doublings are possible by scaling GPU cores. In this arrangement, it isn't hugely important if the floating point format is 32 bit or 40 bit, Microsoft/Apple/Intel format, Base 240 or Base 256. Historically, Commodore followed Microsoft in preference to Intel. (See Vintage Computer Festival: Jack and the Machine for rationale.) However, a fresh, consistent implementation should investigate Base 240 representation.