WDCs W65T32 Terbium 32-Bit Processor
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
I'm not aware of any microcontrollers which have separate instruction and data address spaces. I'm aware of quite a few DSPs which have this, but not any microcontrollers.
TMorita wrote:
From a compiler code generation point of view, it is indeed a mistake.
For example, in certain situations, gcc needs to generate dynamically generate code at runtime. On most processors, this can be done on the runtime stack, but if a processor has separate instruction/data address spaces, this can't be done.
For example, in certain situations, gcc needs to generate dynamically generate code at runtime. On most processors, this can be done on the runtime stack, but if a processor has separate instruction/data address spaces, this can't be done.
From your description I see that from a compiler builder point of view consistency and flexibility is important - flexibility in that many/all registers can be used for the same purposes (e.g. output values) and consistency in that all registers and operations behave the same/similarly (e.g. with addressing modes)
I guess this much improves the compiler's ability to schedule(?) operations on different registers, so as to avoid moving data around between registers and temporary variables.
BTW: How does cc65 work in this respect?
André
Quote:
For example, in certain situations, gcc needs to generate dynamically
generate code at runtime.
generate code at runtime.
What situations are these for? I cannot think of any reason why gcc would need to dynamically create code. This might be a feature of glibc (e.g., to optimize memory moves given a specific pattern of input parameters), but not gcc.
Quote:
I'm not aware of any microcontrollers which have separate instruction and data address spaces. I'm aware of quite a few DSPs which have this, but not any microcontrollers.
I might add that GCC for ATmega microcontrollers (Atmel's top of the line units) produces amazingly good code. Although, I hardly exercised it when I tried it. Still, it was amazingly nice.
fachat wrote:
Does that not collide with the "no-execute" features of modern e.g. x86 CPUs?
The biggest hit comes from the fact that you cannot have a single cache line in code and data caches at the same time (engineered specifically this way because, at the time the 80486 came about, self-modifying code was still an issue, apparently). Other CPUs, like the PowerPC and 68K series, had to have special instructions executed that specifically flushed a cache line with intent for execution or data access (DFLUSH and IFLUSH instructions). The Amiga proved that having these were not such a huge burden; when the 68030 became available, the impact on existing software was minimal, and within two months, all errant software had new releases that were 68030 compatible, and fully backward compatible with the 68000 through 68020. In short, it was a non-issue, a normal industry growing pain that was quickly resolved.
However, as usual, Intel favored increasing levels of complexity rather than simplicity.
Quote:
I guess this much improves the compiler's ability to schedule(?) operations on different registers, so as to avoid moving data around between registers and temporary variables.
Given a 65816 that runs with modern semiconductor processes, you should be able to realize a Commodore 64 that runs at several GHz performance too. But, the caches would need to be engineered differently to compensate for the 65816's heavy use of RAM (instead of registers). It's doable, but is the return on investment worth it?
Quote:
BTW: How does cc65 work in this respect?
However, I'm thinking more seriously now of implementing a MachineForth compiler for the 65816 (yes, I know it's not C, but not all languages have to be like C!) using GForth, and hopefully with any luck, I can speed up my Kestrel's firmware development by doing so.
kc5tja wrote:
What situations are these for? I cannot think of any reason why gcc would need to dynamically create code. This might be a feature of glibc (e.g., to optimize memory moves given a specific pattern of input parameters), but not gcc.
Anyway, when nested functions are used, gcc will create and execute code on the runtime stack when calling the nested function. I don't remember the exact reason why, unfortunately. I think it had something to do with massaging the arguments, but I'm not sure about this.
Toshi
I think TMorita might be referring to small blocks of code created at run-time that must be able to find associated data without any additional information. They would do this by storing the data with the code, relative to the program counter. I remember this kind of thing from the M68K days of the Mac, and the solution then was to store the data with the code, but not in an instruction, so no code was being modified when the data values were set. But on the Mac you weren't generating this at run-time the way it sounds like gcc does. GCC has lots of non-portable extensions to the language. I don't think plain C has any need for run-time code generation.
Indeed, that kind of runtime code generation sounds like a no-win situation to me. I've observed my C code output in disassembly form, and never observed dynamically generated code. But, I also don't use the nested functions solution.
However, I do occasionally use Oberon, and I'm patently aware that that environment also does not utilize dynamically generated code either. So GCC's solution to the problem seems, to me, very much like installing a finishing nail with a rock thrown from a ballista to me.
Where I do see dynamically generated code as being exceptionally helpful is in graphics blitter routines, where blit operations can be compiled given a set of constraints (kind of like compiling SQL queries prior to their use). I intend on using this approach for a more sophisticated graphics interface to the Kestrel, since the 65816 is so utterly horrible when it comes to general purpose bit manipulation (single-bit shifts and rotates really, really, really suck
).
However, I do occasionally use Oberon, and I'm patently aware that that environment also does not utilize dynamically generated code either. So GCC's solution to the problem seems, to me, very much like installing a finishing nail with a rock thrown from a ballista to me.
Where I do see dynamically generated code as being exceptionally helpful is in graphics blitter routines, where blit operations can be compiled given a set of constraints (kind of like compiling SQL queries prior to their use). I intend on using this approach for a more sophisticated graphics interface to the Kestrel, since the 65816 is so utterly horrible when it comes to general purpose bit manipulation (single-bit shifts and rotates really, really, really suck
Terbium "the chip" didn't disappear; something which never existed cannot possibly disappear.
The 65T32 product announcement was for a 32-bit address space, 16-bit word-addressed processor (thus an 8GiB address space) that is a superset of the 65816. It is still in development. And, yes, it's still vaporware until we see real silicon or IP licenses being offered.
However, what started out as a project name has evolved into a trademark for the company. They're applying that trademark to all their products. When I asked for an update on the status, they were tight-lipped -- far more so than they ever have been. This suggests that they are relatively close to some kind of Terbium-related announcement. I cannot predict what kind of announcement it will be. It's definitely NOT the TIDE though, for it is already public knowledge.
This is not an unheard of thing for companies like WDC to do. It happens all the time. When I worked for Hifn, we did the same thing.
Though, I did suggest to WDC that they get rid of the 65T32 name -- it's not well liked by WDC nor by me because the 6532 is already taken by another 65xx-class chip.
I personally would prefer 65000; however, I did humorously suggest that they resurrect the 65832 name, since that processor died while still on the drawing board. They got the market all hyped up about it, gave release dates, and even sample datasheets. And then . . . *poof*. Nothing. Which adds all the more to the humor of using the 65832 moniker.
I still prefer the name "65000" though, because that leaves plenty of space open for families of processors based on the architecture.
The 65T32 product announcement was for a 32-bit address space, 16-bit word-addressed processor (thus an 8GiB address space) that is a superset of the 65816. It is still in development. And, yes, it's still vaporware until we see real silicon or IP licenses being offered.
However, what started out as a project name has evolved into a trademark for the company. They're applying that trademark to all their products. When I asked for an update on the status, they were tight-lipped -- far more so than they ever have been. This suggests that they are relatively close to some kind of Terbium-related announcement. I cannot predict what kind of announcement it will be. It's definitely NOT the TIDE though, for it is already public knowledge.
This is not an unheard of thing for companies like WDC to do. It happens all the time. When I worked for Hifn, we did the same thing.
Though, I did suggest to WDC that they get rid of the 65T32 name -- it's not well liked by WDC nor by me because the 6532 is already taken by another 65xx-class chip.
I personally would prefer 65000; however, I did humorously suggest that they resurrect the 65832 name, since that processor died while still on the drawing board. They got the market all hyped up about it, gave release dates, and even sample datasheets. And then . . . *poof*. Nothing. Which adds all the more to the humor of using the 65832 moniker.
I still prefer the name "65000" though, because that leaves plenty of space open for families of processors based on the architecture.
kc5tja wrote:
Indeed, that kind of runtime code generation sounds like a no-win situation to me. I've observed my C code output in disassembly form, and never observed dynamically generated code. But, I also don't use the nested functions solution.
...
...
"In the GCC compiler, trampoline refers to a technique for implementing pointers to nested functions. The trampoline is a small piece of code which is constructed on the fly on the stack when the address of a nested function is taken. The trampoline sets up the static link pointer, which allows the nested function to access local variables of the enclosing functions. The function pointer is then simply the address of the trampoline. This avoids having to use "fat" function pointers for nested functions which carry both the code address and the static link. See GCC internals: Trampolines for Nested Functions. "
Toshi
fachat wrote:
BTW: How does cc65 work in this respect?
Hampered, but not impossible. You can use dp,x as a substitute, for example. Recursion is limited, of course.
However, that being the case, the 6502 is better suited for languages which lack any concept of "activation frames" (e.g., Forth). The 65816 fares MUCH better than the 6502 for supporting stack frames. Generalized activation frames (e.g., Scheme's "continuations"), however, still give it quite a bit of trouble.
However, that being the case, the 6502 is better suited for languages which lack any concept of "activation frames" (e.g., Forth). The 65816 fares MUCH better than the 6502 for supporting stack frames. Generalized activation frames (e.g., Scheme's "continuations"), however, still give it quite a bit of trouble.
kc5tja wrote:
Hampered, but not impossible. You can use dp,x as a substitute, for example. Recursion is limited, of course.
However, that being the case, the 6502 is better suited for languages which lack any concept of "activation frames" (e.g., Forth).
However, that being the case, the 6502 is better suited for languages which lack any concept of "activation frames" (e.g., Forth).
Thus proving my point. mid-80s-era BASIC implementations (of which EhBASIC is an instance) simply lack activation frames, thus preventing effective use of structure and, in particular, modules when programming. Nobody needs local variables!! 
Of course, you can fake locals with arrays:
All you're doing here is hard-coding the activation frame (check it -- it even allows recursion, but I still won't recommend it) mechanisms by hand. It might even compile efficiently (it certainly won't interpret very fast!).
This reminds me of a VisualBasic 3 programming project I once had to do, where I sorely lacked various features provided by proper activation frames. I ended up not only re-implementing them as above, but I also implemented "virtual CPU registers" as global variables too, to facilitate parameter passing. This was for a dentistry management application. The code was ugly, but the code ran correctly. We shipped on schedule.
However, the above approach simply doesn't give you the benefit of modular programming, and moreover, heavens help you if you attempt object oriented or functional programming.
Languages like Lisp or Scheme have activation records, not frames. That means, these things are actually data structures, allocated on the heap just like any other object a program would manipulate (this is taken advantage of in Scheme by exposing continuations, an entire chain of activation records). These provide all the same benefits of activation frames, but may be more suitable to the 6502 because they're normal data structures, and not based on a stack. As such, you can implement them much more freely (their semantics aren't so hard-set in concrete). Hence, I actually expect a language like Lisp/Scheme to outperform C on the 6502.
I should note that Forth on the 65816 relies heavily on stacks, and even in the best of cases (subroutine threaded with primitive inlining), you're looking at an average of 10 clock cycles per Forth primitive. Colon definitions or their equivalents are a minimum of 12 cycles. So, as a general rule, if you have a 10MHz 65816, you're pulling about 1 Forth MIPS. It varies erratically, and deeply nested colon definitions will drop that to about 0.5 Forth MIPS, but still amazingly consistent. I would expect all MIPS measurements to drop by a factor of 2 for the 6502.
Although it relies heavily on stacks, it gets its performance from not having activation frames -- it implicitly assumes the operands you're working with are always at the top of the stack, something even a 6502 can handle relatively easily.
Someday, I'll toy with the idea of making a Lisp-like language of some kind for the 65816, based on my experience with Haskell.
Of course, you can fake locals with arrays:
Code: Select all
10 SP=0:REM My fake stack pointer
20 DIM IS(100), SS$(100):REM The stacks.
...
1000 REM Compute celcius from farhenheit the HARD way.
1010 SP=SP+1
1020 IS(SP)=(IS(SP-1)-32)/180
1030 R=IS(SP)*180
1040 SP=SP-1:RETURN
...
This reminds me of a VisualBasic 3 programming project I once had to do, where I sorely lacked various features provided by proper activation frames. I ended up not only re-implementing them as above, but I also implemented "virtual CPU registers" as global variables too, to facilitate parameter passing. This was for a dentistry management application. The code was ugly, but the code ran correctly. We shipped on schedule.
However, the above approach simply doesn't give you the benefit of modular programming, and moreover, heavens help you if you attempt object oriented or functional programming.
Languages like Lisp or Scheme have activation records, not frames. That means, these things are actually data structures, allocated on the heap just like any other object a program would manipulate (this is taken advantage of in Scheme by exposing continuations, an entire chain of activation records). These provide all the same benefits of activation frames, but may be more suitable to the 6502 because they're normal data structures, and not based on a stack. As such, you can implement them much more freely (their semantics aren't so hard-set in concrete). Hence, I actually expect a language like Lisp/Scheme to outperform C on the 6502.
I should note that Forth on the 65816 relies heavily on stacks, and even in the best of cases (subroutine threaded with primitive inlining), you're looking at an average of 10 clock cycles per Forth primitive. Colon definitions or their equivalents are a minimum of 12 cycles. So, as a general rule, if you have a 10MHz 65816, you're pulling about 1 Forth MIPS. It varies erratically, and deeply nested colon definitions will drop that to about 0.5 Forth MIPS, but still amazingly consistent. I would expect all MIPS measurements to drop by a factor of 2 for the 6502.
Although it relies heavily on stacks, it gets its performance from not having activation frames -- it implicitly assumes the operands you're working with are always at the top of the stack, something even a 6502 can handle relatively easily.
Someday, I'll toy with the idea of making a Lisp-like language of some kind for the 65816, based on my experience with Haskell.