Once you look at the capability of the devices though, you might wonder why you're building a 6502 with them! A CPLD seems to me a better fit, but I don't believe you'd get the memory on-chip, and I'm not sure about 100MHz.
Improving the 6502, some ideas
ElEctric_EyE wrote:
I would like to see more speed. Even if no new instructions or add-ons. Just RAM/EEprom with the CPU on a single die, and 100+ MHz.
Once you look at the capability of the devices though, you might wonder why you're building a 6502 with them! A CPLD seems to me a better fit, but I don't believe you'd get the memory on-chip, and I'm not sure about 100MHz.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
I was thinking it might be good if LDA toggled between A and B. As in:
lda #1 ; A = 1, B = ?
lda #2 ; A = 2, B = 1
lda #3 ; A = 3, B = 2
The advantage is that you don't need an LDB instruction.
lda #1 ; A = 1, B = ?
lda #2 ; A = 2, B = 1
lda #3 ; A = 3, B = 2
The advantage is that you don't need an LDB instruction.
I guess I can see some usefulness, but OTOH, an LDB would be the same as LDx where "x" is any of the other many registers, so it doesn't really add any complexity. "B" just has a different bit pattern in the three or four bits of the op code that indicate what register, probably %0001. (A is %0000, C is %0010, X is %0011, Y is %0100, etc..)
Quote:
An on-chip register stack? See the transputer and Forth Chips
Thanks for the link. Yes, I have definitely been interested in that. I should spend some time there and see if any of the 32-bit ones are currently in production, or explore the ones with VHDL source available. I kept an ad for a VME board from 20 years ago with a Harris RTX-2000, and another for a board with an SC32. As I remember, the first one did 16 Forth MIPS at 12MHz, with a maximum 4-clock interrupt latency and 0-clock RTI. Years before that, there was also a 4MHz MicroSpeed board for the Apple II which ran Forth in hardware and let the Naval labs simulate landing an aircraft on a carrier in real time. The software that came with it fit on a single floppy too. Very impressive performance! I would also like to pursue the idea of a 32-bit 6502 though.
Quote:
Garth: perhaps a new forum for the 65Org32 on here? That way, the various subjects can have their own threads, such as basic architecture, real world interfacing, opcodes, VHDL design, etc.
That's going further than I was thinking. Good idea. I should talk to Mike about that. Even if we don't pull this one off, the subject will occasionally keep coming up.
Quote:
Memory controller: maybe a small controller circuit portion that can be updated when required?
as in programmable logic, or a small board of some sort? It would be nice if we could just use commercially available DRAM modules (2GB anyone?), if they don't force us to use bursts at the clock rates that hobbyists can build for. If they require sub-nanosecond rise time in and out though, we won't be able to use them.
Quote:
PROBLEM: nearly ALL graphics chips shift MSB first, not LSB, and are big-endian, since it yields the simplest possible logical mapping of bits to pixels on the screen. However, the CPU's shift/rotate instructions assume the opposite bit and byte ordering -- thus, using ROR/ROL/LSL/ASR on a word-sized piece of byte-sized bitmap data will result in garbage graphics!!
Although this had crossed my mind earlier, I was hoping to not have to go to this complexity: How about having a few instructions to change byte order. That would accommodate Tony's byte-packing and -unpacking senario too.
Quote:
Even with multitasking, the image stays fixed in RAM.
I'm not sure I'm understanding your whole paragraph (or pair of them) there, but what I had in mind is what my HP-71 hand-held computer does. You can have scores of program files (as long as they're not Forth!) and one of them can even use and then delete another one at a lower memory location and everything gets scooted down in memory so all the available memory is together at the end instead of ending up fragmented, and the program continues running-- just at a different address.
Quote:
Far more important to me are instructions that add (and/or subtract) immediate values to X, Y, and S.
Without this facility, even implementations of Forth spend unnecessary amounts of time tweaking the registers instead of executing the user's program.
Without this facility, even implementations of Forth spend unnecessary amounts of time tweaking the registers instead of executing the user's program.
If I weren't advocating an all-32-bit design, I'd want double-increment and double-decrement instructions, since the '02 and '816 do a lot of INX INX and DEX DEX in Forth and other high-level languages. Sometimes further complicating it is the situation if you need to branch on a page-crossing condition, since you either have to make sure you'll start and end on even addresses or test after both INC's (or DEX's). A DINX followed by a single branch-on-rollover instruction would streamline things.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
BigEd wrote:
ElEctric_EyE wrote:
I would like to see more speed. Even if no new instructions or add-ons. Just RAM/EEprom with the CPU on a single die, and 100+ MHz.
I like 6502 assembly, and the tools available for free to develop the software and program the EEPROM. If someone could develop a type of 65C02 "PIC" with different onboard RAM/EEPROM spec's I would pay at least $50 USD for a device that:
1: was a basic 65C02 device capable of addressing 16MB+, maybe even 1/2GIG of ondie RAM.
2: had EEPROM ondie space of 32K, maybe more or less...
3: major speed, 100+MHz to work basic (8 bit add, sub, asl, lsr) math functions on that large memory space, math that could be done on all registers, and also just a few more registers, besides a,x, and y would be nice...
man that would be a beefy 6502: I would call it the 65C002. mmmm, nice, I'm going to go to sleep now and dream it may be made by someone...
I might pay more, much more actually if this device existed.
GARTHWILSON wrote:
Quote:
Even with multitasking, the image stays fixed in RAM.
The 6502 lacks this feature completely, and the 65816 has lackluster support at best. How do you achieve that capability with the 6502? You cheat.
You write this in a text file FOO.A:
Code: Select all
LDA $BEEF
Code: Select all
0000 AD EF BE
You write this in a text file BAR.A:
Code: Select all
LDA myVariable
Code: Select all
0000 AD 00 00
But this raises a new problem: how does the loader know where to fix up addresses?
The assembler spits out an object file format that a linker and/or loader interprets (for this reason, loaders are often called interpreters too). This file is usually expressed as a recursive descent tree of objects, for that is the easiest way to parse these kinds of data. For example, here might be a high-level representation of such a file, expressed in a pseudo-LISP notation for ease:
Code: Select all
(object-file "BAR.O"
(cpu 6502)
(chunk "code"
(org 0)
(bytes 0xAD 0x00 0x00)
)
(xref "myVariable")
(chunk "code-fixups"
(absolute-address-at :section "code" :at 1 :to "myVariable")
)
)
Code: Select all
0000: 00 05 42 41 52 2E 4F
__ __ \_____________/
| | |
| | +------------- file name of the module (BAR.O)
| +---------------------- length of the filename
+------------------------- This byte stands for "module filename follows"
0007: 01 01 02
__ __ __
| | |
| | +------------------- Identifies the intended target as a 6502
| +---------------------- Only one byte for this section.
+------------------------- This byte stands for "CPU tag follows."
000A: 02 04 63 6F 64 65
__ __ \_________/
| | |
| | +--------------- Identifies the section name as "code"
| +---------------------- There are four letters in "code"
+------------------------- We're beginning a new section.
0010: 03 02 00 00
__ __ _____
| | |
| | +------------------ We place the "virtual" origin at $0000
| +---------------------- Two bytes for the origin
+------------------------- We're specifying an origin for this section.
0014: 04 03 AD 00 00
__ __ ________
| | |
| | +---------------- The actual data to be filled into the section when loaded.
| +---------------------- There are three bytes to load for this section.
+------------------------- Raw data follows.
0019: 05 0A "myVariable"
__ __ ____________
| | |
| | +--------------- The name of the symbol we MUST have defined
| | (courtesy of the loader, which means we must have
| | loaded another module first!) to load this module.
| +---------------------- The symbol name has 10 characters in it.
+------------------------- Reference a symbol.
0025: 02 0B "code-fixups"
__ __ _____________
| | |
| | +--------------- section name
| +----------------------- which has 11 characters in it
+-------------------------- A new section starts here!
0033: 06 04 "code" 01 00 0A "myVariable"
__ __ ______ _____ __ ____________
| | | | | |
| | | | | +-- the symbol we want to reference ...
| | | | +--------- which has 10 characters in its name.
| | | +-------------- The code at offset $0001 from the ...
| | +-------------------- "code" section, which
| +------------------------- has 4 characters in its name.
+---------------------------- Tell the loader to apply an absolute address fixup.
When you think things through, maintaining this kind of metadata about the code makes a lot of sense, because it allows your binary module loader to (a) determine how much memory to allocate for each module it's loading, (b) apply effective address computations at load-time, which in turn enables (c) assemblers, compilers and the like to not have to worry about explicitly making PC-relative code. Hence, you needn't even use BSR or its ilk (which incurs a runtime performance overhead anyway) if you don't want to.
About code relocation:
You that this has already been done for the 6502 with a relocatable file format?
See http://www.6502.org/users/andre/o65/index.html
The problem with relocating running code (at least in the 6502 where code is not relative) is not that the relocation tables would have to be stored - but address values sneak into variables (fixed values like jump tables are in the relocation tables too) or into the stack. And it is basically impossible to determine which part of the stack is an address and what is normal data.
So if you don't have any support from the program itself (like going into a state where no address values are stored on stack or in other variables) you can only relocate at load time.
André
kc5tja wrote:
GARTHWILSON wrote:
The 6502 lacks this feature completely, and the 65816 has lackluster support at best. How do you achieve that capability with the 6502? You cheat.
...
How does the assembler know where myVariable exists in RAM? It doesn't, and it never will. The loader (that which loads the code into RAM, and is fully aware that it's loading code and not just raw data) is the ultimate decider of where myVariable sits, for IT is responsible for allocating memory to hold the binary images.
But this raises a new problem: how does the loader know where to fix up addresses?
...
How does the assembler know where myVariable exists in RAM? It doesn't, and it never will. The loader (that which loads the code into RAM, and is fully aware that it's loading code and not just raw data) is the ultimate decider of where myVariable sits, for IT is responsible for allocating memory to hold the binary images.
But this raises a new problem: how does the loader know where to fix up addresses?
See http://www.6502.org/users/andre/o65/index.html
Quote:
When you think things through, maintaining this kind of metadata about the code makes a lot of sense, because it allows your binary module loader to (a) determine how much memory to allocate for each module it's loading, (b) apply effective address computations at load-time, which in turn enables (c) assemblers, compilers and the like to not have to worry about explicitly making PC-relative code. Hence, you needn't even use BSR or its ilk (which incurs a runtime performance overhead anyway) if you don't want to.
So if you don't have any support from the program itself (like going into a state where no address values are stored on stack or in other variables) you can only relocate at load time.
André
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
How does the calculator know that the code will "just run?" The CPU must use 100% PC-relative addressing modes for all effective addresses into code-space.
Thanks for the explanation on the loader. I figured André's related system had to be doing something like that. [Edit: as he just mentioned in his last post]
My interest in moving already-loaded programs around comes from the times I've loaded up different applications on the workbench computer, then, after being done with one of the earlier ones, wanting to delete it and make room for something else; but that would leave memory fragmented, without enough contiguous room to load the next thing. In that case, I've had to also dump and re-load what was after the part I was finished with.
Quote:
The problem with relocating running code (at least in the 6502 where code is not relative) is not that the relocation tables would have to be stored - but address values sneak into variables (fixed values like jump tables are in the relocation tables too) or into the stack. And it is basically impossible to determine which part of the stack is an address and what is normal data.
I had forgotten about the stack; but as long as you don't use preëmptive multitasking and relocate the program and its data space while it has references to outside programs (including the fixed OS) on the stack, it seems like it should be ok to have the OS do the move and adjust the values to put in the program and data bank registers for the particular program (those registers now being 32-bit and able to hold any value, not being limited to 64K boundary lines). Coöperative multitasking, if multitasking is required, should avoid the problem by allowing the switching of tasks only when it's safe. Interrupts would have to have some pointers that the OS would fix, requiring temporarily disabling interrupts for the move.
This is new thinking for me, so I won't be one bit offended if you point out other things I'm leaving out. I'm sure there are many of them. At risk of looking ignorant, I'll say that the term "garbage collection" sounds related. How does that work (without getting too complicated)?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
Nightmaretony
- In Memoriam
- Posts: 618
- Joined: 27 Jun 2003
- Location: Meadowbrook
- Contact:
Although this had crossed my mind earlier, I was hoping to not have to go to this complexity: How about having a few instructions to change byte order. That would accommodate Tony's byte-packing and -unpacking senario too.
At my present job, I work with VME boards. They use hardware byte swap chips. The implementation in VHDL is not that hard. Older logic, you are looking at 245s and 374s, with an intermediary bus. Definitely feasible.
Onboard stack is limiting as heck, plus taking up logic gate resources. The 6502 stack concept is helluva valid. An onboard would be more equivalent to a metric (biblical beast of burden)load of internal registers.
Also Garth, are you able to add in code projects on here? I emailed Mike with my table lookup routine from the pinball program but have not had any reply. Wanted to add to the code knowledge base
At my present job, I work with VME boards. They use hardware byte swap chips. The implementation in VHDL is not that hard. Older logic, you are looking at 245s and 374s, with an intermediary bus. Definitely feasible.
Onboard stack is limiting as heck, plus taking up logic gate resources. The 6502 stack concept is helluva valid. An onboard would be more equivalent to a metric (biblical beast of burden)load of internal registers.
Also Garth, are you able to add in code projects on here? I emailed Mike with my table lookup routine from the pinball program but have not had any reply. Wanted to add to the code knowledge base
"My biggest dream in life? Building black plywood Habitrails"
GARTHWILSON wrote:
very seamless and bug-free in that respect
Quote:
My interest in moving already-loaded programs around comes from the times I've loaded up different applications on the workbench computer, then, after being done with one of the earlier ones, wanting to delete it and make room for something else; but that would leave memory fragmented, without enough contiguous room to load the next thing. In that case, I've had to also dump and re-load what was after the part I was finished with.
(1) Make your program event-driven, and mandate that your software handle EVENT_BEGIN_RELOCATE and EVENT_END_RELOCATE events, both issued by the kernel of your OS. The idea being EVENT_BEGIN_RELOCATE gives your code the chance to serialize your program's state into a form suitable for later deserialization during EVENT_END_RELOCATE. So, your control flow is the OS realizes it needs to collapse memory. It issues EVENT_BEGIN_RELOCATE to an application, which then uses this time to prepare for the move. The OS then collapses memory by relocating the program's code and data segments. EVENT_END_RELOCATE is then sent to the newly relocated program, which restores its previous state, thus giving it a chance to fix up addresses.
(2) Mandate your program NEVER uses absolute addresses for internal memory references. This is the method that PC/GEOS took, and it not only worked extremely well, but it also enabled them to utilize a "vm file" (virtual memory file) system where it could provide raw application state dumps as data files. The result was a user experience wherein loading and saving application state was effectively instantaneous, and this on a 4.77MHz CPU! To access a chunk of memory, you'd first have to explicitly "lock" it into physical memory. While locked, the chunk never moved, giving the application(s) the opportunity to touch the memory as needed. After you're done using it, you "unlock" the memory, giving the system the chance to move (or even dump back to disk) the memory.
(3) Use a memory management unit to page memory and establish multiple, unique address spaces for each program. This is what nearly all modern OSes do today. All programs literally run at address 0. Only shared libraries are loaded at non-zero addresses, and even then, nobody can guarantee where in the address space they appear, even amongst different address spaces! The kernel is responsible for managing memory in finite units called "pages" (typically 4K each), and updating hardware memory mapping configurations. This is the most difficult to implement, but by far the fastest and most convenient approach from an application programmer's point of view. No PC-relative code needed (except for "floating" shared libraries, as discussed above), and no need to lock memory manually, and no need for handling relocation events.
(4) Use a bytecode interpreter. This is the slowest possible method, and by what you described, is what your HP calculators are actually doing, especially with BASIC. Bytecodes can be implemented however you want, including mandating that all code be virtual-PC-relative.
(5) Make all code and data PC-relative. This is, for example, how Win32Forth works.
Quote:
The problem with relocating running code (at least in the 6502 where code is not relative) is not that the relocation tables would have to be stored - but address values sneak into variables (fixed values like jump tables are in the relocation tables too) or into the stack. And it is basically impossible to determine which part of the stack is an address and what is normal data.
A handle, in its simplest possible incarnation, is merely a pointer to a pointer. The next most complex is an index into a table of pointers. It allows you play all sorts of neat tricks with memory management without adversely affecting runtime performance (your inner-most loops will take the performance hit, but as discussed above, "locking" your used memory returns your performance back. Just remember to unlock your memory after you're done).
The "handle table" is a data structure known, at least in part, to both your application and your memory manager. When a chunk of memory is relocated, the memory manager is responsible for updating its corresponding handle pointer. This frees the application to refer to chunks of memory via the handle, which remains fixed throughout the entire lifetime of the program.
Quote:
I had forgotten about the stack; but as long as you don't use preëmptive multitasking and relocate the program and its data space while it has references to outside programs
However, you can only get that with a strongly-typed language. Languages like Forth simply cannot handle this. Languages like Scheme, Smalltalk, and Lisp get around this problem either by using tagging, which I won't discuss here, or by simply not using activation frames as they're commonly understood.
Quote:
I'll say that the term "garbage collection" sounds related. How does that work (without getting too complicated)?
Code: Select all
void foo() {
char *myMemory = malloc(16); /* allocate 16 bytes of RAM */
UseItHere(myMemory); /* Do something with it */
free(myMemory); /* OK, we're done with it, so free it. */
}
Unfortunately, even without the specter of multitasking, more complex applications prove difficult to know when to actually release memory. For example, what actually happens in UseItHere() will determine whether or not the free(myMemory); call is actually a bug.
If UseItHere() adds that newly allocated piece of memory to a data store of some kind, then calling free() on it is a bug, because the memory might still be in use elsewhere in the program. (Let's ignore the fact that it actually is in use by the data store itself.)
So, the truly safest way to handle this is to rewrite the code as follows:
Code: Select all
void foo() {
char *myMemory = malloc(16); /* allocate 16 bytes of RAM */
UseItHere(myMemory); /* Do something with it */
}
This is called a "memory leak", and I'm positive you've heard that term before.
So, what is the solution? There are only two known solutions to this problem:
(1) Apply formal logic, problem solvers, and static program analysis to determine which chunks of memory get allocated, and when they're to be deallocated. This is the approach that functional languages like MLton use to help reduce garbage collection overhead. ML alone takes 22 code passes to perform all of its optimizations. MLton adds 13 more. Thankfully, ML compiles so fast that this hit in compile performance isn't that noticable. Still, imagine trying to maintain a compiler with a 35 passes, and compare that to your assembler or C compiler, which typically only has 2.
(2) Rely on garbage collection.
Garbage collection works because a very simple observation: you can't use a chunk of memory if you don't have a pointer to it. Ignoring the case of synthesizing pointers (which is a type and safety violation, by the way), once you lose the last pointer to a region of memory, that's it -- it's gone forever!! You can't restore it, unless you have a pointer cached away somewhere accessible (like the data store UseItHere() might be using).
So, a garbage collector's job is to examine two things: the program state, and the memory state, and based on this mutual information, dispose of memory no longer referenced. It does this, for all intents and purposes, concurrently with the running of your program.
Notice that BASIC has a DIM statement to allocate memory for arrays. Notice it also can construct arbitrarily large strings. However, it lacks completely any means of disposing allocated memory (some might argue the CLEAR command performs this job, but in reality, it's purpose is more complex. Executing CLEAR inside a running program often results in catastrophe unless you know EXACTLY what you're doing). This is because BASIC is a garbage collected environment.
Notice how languages like Java or even Oberon allow you to create new objects, but have no syntactic means of getting rid of them. This is because they, too, utilize garbage collection.
The end results with garbage collected languages are:
(1) ever so slightly slower runtime performance. This is because the underlying CPU simply lacks hardware assistance for implementing "read barriers" and "write barriers", and so these things need to be implemented in software. Most of these barriers can actually be optimized away by advanced compilers, thankfully, which is why performance hits are "slight" and not "god-awful" like they were in the 60s and 70s.
(2) NEVER do you get an accidental memory leak which crashes the system due to following a stray pointer. It's simply impossible, and provable mathematically.
You do get a different kind of resource leak with garbage collected languages though -- as long as you have a pointer to a chunk of memory, it'll never be discarded. Garbage collectors are not intention detectors, so when you are truly finished using a piece of data, the proper thing to do is assign its pointer to NULL (typically the value of zero, but it's best to use your language's symbolic representation for this; for Pascal/Modula-*/Oberon/Ada, use "nil" or "NIL" instead). That way, you eliminate that reference (without affecting others), giving the collector enough information to do its job.
Statistically speaking, however, comparing the memory leaks arising from not using garbage collection against the resource leaks from garbage collected systems, it is clear to see which is the lesser of the two evils. A brilliant example is the recent release of Internet Explorer, which will gladly crash itself after many hours of continual use due to leaked memory. Contrast this with Firefox, which will not leak memory nearly as fast unless you're using plugins (which Firefox can't control), such as the infamous Flash 9 or Flash 10 plugin under Linux.
Some languages, like Smalltalk or Lisp, won't even let you manipulate pointers of any kind, so it's impossible to casually leak resources in those environments. (It's doable, but you actually have to explicitly code for it!)
This raises a good point, too -- if you're planning on writing a program so that it's extensible with plugins, you can do all the static analysis and even dynamic analysis in the world, and you still won't have a reliable system. All because of those plugins, you might end up with an intractible memory leak that ultimately brings your system down.
This is why I advocate writing code in some dialect of Oberon instead of C: Oberon compiles to native code, just like C does, but has garbage collection. If you write extensible programs, then the extensions, too, are written in Oberon or an Oberon-compatible language. The end result is you get garbage collection and type-safety throughout the system. And, it compiles a hell of a lot faster too -- easily a factor of 15 times faster on my box, and for comparable levels of optimization. For Enterprise-scale programs, that speed boost is quite significant.
I haven't learned Ada yet, but I understand Ada exhibits most, if not all, of the properties that make Oberon so appealing to me. I'll need to take that language up someday.
It's possible to use garbage collection with C, although it's a hack. Basically the garbage collector looks for anything that might be a pointer to allocated memory, although it has no way of knowing that it actually is a pointer, so you get some false positives. More dangerously, if a pointer is somehow disguised, or located somewhere the collector doesn't look, or placed at a different memory alignment than the collector expects, then the collector may free memory still in use. Still, some people swear they work.
I definitely swear that they work as long as you write code like a normal human being. That is, no XOR-tricks to store two pointers in a single memory location, no bit-shifted addresses (I'm talking to you, Metacomco! What the HELL were you thinking when you decided to use BCPL pointers instead of normal CPU pointers for AmigaDOS data structures?!), etc.
However, they don't work as well as a fully confident garbage collector. Especially when the compiler has all the data it needs to fill out a few forms to allow the GC to do its job in a tag-free manner. C is a great language, but Oberon is better, if only because of GC (syntax is irrelevant to me for the most part, though others are utterly repulsed by its more easily readable syntax. That's always confused the heck out of me, but I digress). Thankfully, there's Java, which gives most of the advantages of Oberon with a syntax just C-like enough that people are overwhelmingly happy with it, despite the language's many flaws.
However, they don't work as well as a fully confident garbage collector. Especially when the compiler has all the data it needs to fill out a few forms to allow the GC to do its job in a tag-free manner. C is a great language, but Oberon is better, if only because of GC (syntax is irrelevant to me for the most part, though others are utterly repulsed by its more easily readable syntax. That's always confused the heck out of me, but I digress). Thankfully, there's Java, which gives most of the advantages of Oberon with a syntax just C-like enough that people are overwhelmingly happy with it, despite the language's many flaws.
kc5tja wrote:
There are basically several ways of working around this problem:
are actually applicable to a 65xx system....
Quote:
(3) Use a memory management unit to page memory and establish multiple, unique address spaces for each program. This is what nearly all modern OSes do today. All programs literally run at address 0. Only shared libraries are loaded at non-zero addresses, and even then, nobody can guarantee where in the address space they appear, even amongst different address spaces! The kernel is responsible for managing memory in finite units called "pages" (typically 4K each), and updating hardware memory mapping configurations. This is the most difficult to implement, but by far the fastest and most convenient approach from an application programmer's point of view. No PC-relative code needed (except for "floating" shared libraries, as discussed above), and no need to lock memory manually, and no need for handling relocation events.
Quote:
Quote:
The problem with relocating running code (at least in the 6502 where code is not relative) is not that the relocation tables would have to be stored - but address values sneak into variables (fixed values like jump tables are in the relocation tables too) or into the stack. And it is basically impossible to determine which part of the stack is an address and what is normal data.
Relocating variables can still be done with a normal relocator, as long as it knows where that variable is. To access it, of course, as you write, the variable memory must be "locked" - be it with cooperative multitasking or specific lock/unlock calls around that critical section.
Quote:
Quote:
I had forgotten about the stack; but as long as you don't use preëmptive multitasking and relocate the program and its data space while it has references to outside programs
Quote:
Each of the activation frames on the stack has a corresponding "type descriptor" associated with it, which is required for its garbage collector to properly free unused memory. It can guarantee a piece of memory is
Quote:
However, you can only get that with a strongly-typed language.
Quote:
(2) Rely on garbage collection.
...
The end results with garbage collected languages are:
(1) ever so slightly slower runtime performance. This is because the underlying CPU simply lacks hardware assistance for implementing "read barriers" and "write barriers", and so these things need to be implemented in software. Most of these barriers can actually be optimized away by advanced compilers, thankfully, which is why performance hits are "slight" and not "god-awful" like they were in the 60s and 70s.
...
The end results with garbage collected languages are:
(1) ever so slightly slower runtime performance. This is because the underlying CPU simply lacks hardware assistance for implementing "read barriers" and "write barriers", and so these things need to be implemented in software. Most of these barriers can actually be optimized away by advanced compilers, thankfully, which is why performance hits are "slight" and not "god-awful" like they were in the 60s and 70s.
read and write barriers have been greatly improved in Java as well, and are, on modern CPUs, supported by hardware! A write barrier flushes the dirty pages from cache to memory (everything written before the barrier is stored in actual memory, not in cache), a read barrier throws the cache away (everything read after the barrier comes from memory)
The Java memory model and the java.util.concurrent package provide greatly improved performance in synchronizing threads using these constructs (synchronized blocks, which are kind of thread-global, and volatile variables, which work in a local context - i.e. might only flush the single page with the variable, not all dirty pages)
Quote:
You do get a different kind of resource leak with garbage collected languages though -- as long as you have a pointer to a chunk of memory, it'll never be discarded. Garbage collectors are not intention detectors, so when you are truly finished using a piece of data, the proper thing to do is assign its pointer to NULL (typically the value of zero, but it's best to use your language's symbolic representation for this; for Pascal/Modula-*/Oberon/Ada, use "nil" or "NIL" instead). That way, you eliminate that reference (without affecting others), giving the collector enough information to do its job.
But yes, such memory leaks are a constant source of trouble. I need to remember your quote about the intention detector :-)
Quote:
This is why I advocate writing code in some dialect of Oberon instead of C: Oberon compiles to native code, just like C does, but has garbage collection. If you write extensible programs, then the extensions, too, are written in Oberon or an Oberon-compatible language. The end result is you get garbage collection and type-safety throughout the system. And, it compiles a hell of a lot faster too -- easily a factor of 15 times faster on my box, and for comparable levels of optimization. For Enterprise-scale programs, that speed boost is quite significant.
OTOH you can write bad / slow code in any language. A compiler also is not an intention detector!
André
P.S.: I'm a Java architect for a living, so I might be biased....
kc5tja wrote:
CPLDs are generally MUCH faster than FPGAs, pin loading permitting. A 66MHz CPU programmed into an FPGA can generally be expected to run close to 200MHz on a CPLD made with the same semiconductor process.
(Personally, I'd prefer a 6502 or close relative!)
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
(regarding HP-71 BASIC):
I'm not sure exactly what you mean by "program-referenced," but there are global and local variables and arrays, and it all stays intact. Additionally:
HP-71 BASIC does have a DESTROY instruction which allows destroying variables and arrays and freeing up the memory they used. For example, you can
I'm sure that's what it's doing. I wouldn't call the 71 a calculator though. Although it's far more powerful than the 41, it's not as practical as a calculator.
Thankyou for the explanations. I probably wasn't able to read them as fast as you typed them (Samuel types nearly 100wpm and seems to be able to talk about something else at the same time!), but they were definitely helpful. I don't understand them well enough to be able to tell someone else yet, but someday...
perhaps none yet, but if we come up with a suitably capable one...
How 'bout on a 65Org32 hardware level?
VBR started the topic to discuss improvements to the hardware, especially the processor's instruction set. I took it further into extending bus and register widths to get more power while actually dropping some addressing modes. After apologizing for kind of hijacking the topic, I got no complaint from VBR but others showed interest. If this would, down the road, lead to developing a more powerful 65-family processor, it can only be good to discuss things we might want it to be capable of while still making it clearly a 65-family processor.
I believe the Apple IIgs ProDOS used that. Is that correct?
How about just changing the stack pointer, S. Even on the 6502, the stack space can be divided into chunks of say 64 bytes, enough that four different tasks can have plenty of stack space so they won't overrun it and start overwriting a portion reserved for another task. When you switch tasks, you store the current stack pointer in a variable, and load S from the variable for the next task. Of course the '816 allows far more with its 16-bit stack pointer. We could also make the new processor capable of natively handling more stacks.
Quote:
This would only work as long as your program doesn't store program-referenced variables. If it did, then you can only relocate at load-time.
I'm not sure exactly what you mean by "program-referenced," but there are global and local variables and arrays, and it all stays intact. Additionally:
Quote:
Notice that BASIC has a DIM statement to allocate memory for arrays. Notice it also can construct arbitrarily large strings. However, it lacks completely any means of disposing allocated memory (some might argue the CLEAR command performs this job, but in reality, it's purpose is more complex. Executing CLEAR inside a running program often results in catastrophe unless you know EXACTLY what you're doing). This is because BASIC is a garbage collected environment.
HP-71 BASIC does have a DESTROY instruction which allows destroying variables and arrays and freeing up the memory they used. For example, you can
Code: Select all
DIM A, B ! Create real variables, and
COMPLEX C(8192) ! a complex floating-point array of 8192 elements,
DIM D$[96], E$[10] ! then a couple of string variables.
<do stuff>
DESTROY C ! Get rid of C and free up 128K of RAM.
<do more stuff, still using A, B, D$, and E$>Quote:
(4) Use a bytecode interpreter. This is the slowest possible method, and by what you described, is what your HP calculators are actually doing, especially with BASIC. Bytecodes can be implemented however you want, including mandating that all code be virtual-PC-relative.
I'm sure that's what it's doing. I wouldn't call the 71 a calculator though. Although it's far more powerful than the 41, it's not as practical as a calculator.
Quote:
There are basically several ways of working around this problem:
Thankyou for the explanations. I probably wasn't able to read them as fast as you typed them (Samuel types nearly 100wpm and seems to be able to talk about something else at the same time!), but they were definitely helpful. I don't understand them well enough to be able to tell someone else yet, but someday...
Quote:
Quite some good explanations. I wonder which of them
are actually applicable to a 65xx system....
are actually applicable to a 65xx system....
perhaps none yet, but if we come up with a suitably capable one...
Quote:
That just doesn't work on 6502 hardware level...
How 'bout on a 65Org32 hardware level?
Quote:
All of this discussion is well and good, especially ideas on memory management. But, how does it "improve" the 65xx?
VBR started the topic to discuss improvements to the hardware, especially the processor's instruction set. I took it further into extending bus and register widths to get more power while actually dropping some addressing modes. After apologizing for kind of hijacking the topic, I got no complaint from VBR but others showed interest. If this would, down the road, lead to developing a more powerful 65-family processor, it can only be good to discuss things we might want it to be capable of while still making it clearly a 65-family processor.
Quote:
This is a solved problem. The solution is to never use actual addresses. Instead, you store handles to data.
A handle, in its simplest possible incarnation, is merely a pointer to a pointer. The next most complex is an index into a table of pointers. It allows you play all sorts of neat tricks with memory management without adversely affecting runtime performance...
A handle, in its simplest possible incarnation, is merely a pointer to a pointer. The next most complex is an index into a table of pointers. It allows you play all sorts of neat tricks with memory management without adversely affecting runtime performance...
I believe the Apple IIgs ProDOS used that. Is that correct?
Quote:
you can't just relocate the stack of a 6502
How about just changing the stack pointer, S. Even on the 6502, the stack space can be divided into chunks of say 64 bytes, enough that four different tasks can have plenty of stack space so they won't overrun it and start overwriting a portion reserved for another task. When you switch tasks, you store the current stack pointer in a variable, and load S from the variable for the next task. Of course the '816 allows far more with its 16-bit stack pointer. We could also make the new processor capable of natively handling more stacks.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?