6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 9:52 am

All times are UTC




Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2
Author Message
 Post subject:
PostPosted: Sat Aug 28, 2004 8:12 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Quote:
Quote:
In Forth, to insert code, you have to recompile everything after the insertion point, since the addresses of any subsequent words will change

in my experience, recompiling software in Forth is so fast that the application I just beckoned has already been compiled before my finger leaves the ENTER key (assuming your Forth compiler's dictionary lookups are sufficiently efficient). Therefore, the issue of addresses, literally, is a complete non-issue for me.

There may be situations where you don't want to recompile everything past that point. An example might be where there's an array of data you don't want to have to collect again that would get trashed. In that case, it's easy enough to write a new word that has the corrected behavior or expanded code, and make yourself some simple tools to make the beginning of the old word's parameter field refer to the new word, followed by unnest (or some variation thereof). The first system I started out on compiled slowly enough that it was sometimes worth doing it this way.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 28, 2004 4:48 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
GARTHWILSON wrote:
There may be situations where you don't want to recompile everything past that point. An example might be where there's an array of data you don't want to have to collect again that would get trashed. In that case, it's easy enough to write a new word that has the corrected behavior or expanded code, and make yourself some simple tools to make the beginning of the old word's parameter field refer to the new word, followed by unnest (or some variation thereof). The first system I started out on compiled slowly enough that it was sometimes worth doing it this way.


In this case, I would store the data in blocks. However, if even this isn't feasible, then you could always store the data at a well-known buffer location in unallocated space. Chuck Moore does this a lot with his ColorForth environment. Since Forth programs are so tiny, it's highly unlikely that a loaded or reloaded program will overwrite the data buffer. (BTW, this used to be used a lot in the old BASiC-on-8-bit era.)

However, if even that doesn't satisfy your requirements, I completely agree -- in this case, I would have to say that Forth itself may not be the right language (I never said it was the perfect language for all people). If this were really an issue, I would definitely investigate a language like Scheme or a proper subset of Common Lisp, both of which supports proper garbage collection, and can easily support in-situ code updates without having to recollect data. As I understand it, making a proper subset of CL is not very hard to do, and I was thinking about exploring it someday. It also has the benefit of hiding pointers from the programmer, so you just plain don't have to think about them (as an experienced programmer fluent in many languages, I consider this a good thing overall). However, I know for a fact as well that native Lisp environments still allow direct access to machine resources too. Here's some sample code:

Code:
; Print the set of integers from 1 to 10, inclusive.  Woohoo!
; Untested code, so I'm not sure if this will work right.  I used to have a CLisp
; interpreter on my box, but don't anymore.  I'm kinda ticked -- I want to know
; where it went.

(defun last (xs)
  "Returns very last element of a list as an atom"
  (cond
    ((= (length xs) 1) (car xs))
    (t  (last (cdr xs)))))

(defun all-but-last (xs)
  "Returns all but the last element of a list, obviously as a list."
  (cond
    ((< (length xs) 2) ())
    (t (cons (car xs) (all-but-last (cdr xs))))))

(defun reverse (xs)
  "Reverses the order of a list."
  (cons (last xs) (reverse (all-but-last xs)))

(defun append (x xs)
  "Appends x onto the list xs"
  (reverse (cons x (reverse xs))))

(defun range (x)
  "Returns a range of integers from 1 to x in a list."
  (cond
    ((= x 1) (list 1))
    (t  (append x (range (- x 1))))))

(display (range (10))


God, that's horrible. I know there is a more elegant way to append an atom to the end of a list, but for the life of me, I can't figure it out right now. So, I decided to brute force it. Some things to note, though: note that use of variables is typically "bind once, use many times." This is the hallmark of functional programming. I can tell you right now, though, because of my senior moment, the above code would run very, very slowly if it were to be executed (still too fast for the human to notice, since we're dealing only with 10 integers, but still...).

Another language I would definitely look to implement would be Smalltalk -- like Lisp, it is fully garbage collected, and can do in-place code updates. Note that you don't need the full-blown "Smalltalk-80 Environment" -- I'm talking about just the core language. The syntax may be a little bit more agreeable, but it requires a significantly more sophisticated language parser than either Forth or Lisp (still simpler than BASIC though, I think).

Code:
IntegerArray subclass: #IntegerRange
IntegerRange methods

display
  0 to: self length - 1 do: [ i | console print: self at: i ].
  ^ self

range: max
  0 to: max do: [ i | self append: i ].
  ^self


Here I'm cheating, because I'm pretty confident Smalltalk's array objects have an append: method built into them.

Anyway, you get the idea.

But now you might worry about the performance of garbage collection versus explicit memory management. As long as you do your homework on the subject, you won't have a problem with this. Generational collectors make collection virtually transparent in most cases, and if you do so incrementally, it has even less of a performance impact. One dirty trick that Oberon (a language derived from Modula-II, and written by Niklaus Wirth; highly recommended too!) uses is to perform garbage collection while waiting for an I/O request to complete, for example. There is no reason why you should not be able to do the same in a Lisp environment either.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Sep 07, 2004 8:16 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
GARTHWILSON wrote:
Interesting. Can you give some idea of how much space you're finding you need for a kernel, compared to what would be needed for the same thing in indirect-threadded?


Here is the current size. These numbers are probably not going to be particularly useful for a couple of reasons. First, it's not currently optimized for space or for speed. I decided to start with the philosophy "nothing too long, nothing too slow", so it's currently in that middle ground. At this point, I'm leaning toward optimizing the finished version for speed. (Another possible project would be to write a subroutine threaded Forth that is optimized for space, using this version as a starting point.)

Second, it was written from scratch and not derived from an extant indirect or direct threaded implemenation, so there isn't really anything with which to make a good comparison. There's FIG-Forth, but it might not be so easy to choose which pieces of FIG-Forth to use for comparison, since there are aspects which differ from the FIG-Forth model.

Currently, there are 86 words (that includes aliases, which only take up head space) that have been tested (several more untested words have been written). Basically, it's only the words necessary for making the QUIT (text interpreter) loop function. Not included in the 86 words would be words such as :, IF, and VARIABLE. Here is how the code is currently organized (it will definitely be reorganized in the finished version):

Heads: 628 bytes (all 86 words)
Primitives: 548 bytes (34 words)
Constants, variables, and deferred words: 66 bytes (16 words)
High-level words (with inline assembly intermixed): 851 bytes (34 words)
System-dependent (I/O) words: 37 bytes (2 words)

The total comes to just over 2k. For the purpose of debugging and experimentation (and partly my own curiousity) the Forth data stack does not sit on the zero page (unlike most Forth implementations). The total comes to 69 fewer bytes if the Forth data stack is moved to the zero page.

With no point of comparison, you may be wondering how I can claim that subroutine threaded isn't so bad in terms of space. The answer is that I started out writing the high-level words using only JSRs (and a final JMP of course). Basically, I just started with the colon definition and did simply wrote it using JSRs instead of DWs. As I looked at the code that resulted, I noticed a number of ways of optimizing it by replacing JSRs with (inline) assembly. In many instances, the code was not only faster (as you'd expect), but smaller (or at least the same length) too, so there wasn't even a decison to be made about whether to optimize or not. More on this below.

GARTHWILSON wrote:
You don't have to. The quick and dirty way to make a prefix Forth assembler which I used in my Forth was to just make mnemonics that had the address mode built in, like LDA_ABS, LDA(ZP,X), etc.. and comma-in the operand. The mnemonics just compile the appropriate machine-language op code byte. They don't need to even be in a separate vocabulary because they don't conflict with anything else in Forth. For example, AND is always in Forth but if you want the assembly AND, it'll have an address mode associated with it, like AND_ZP or AND#. The branches were a minor challenge, but still not complicated because the assembler in Forth is not normally used for large applications, only writing primitives, runtimes, maybe ISRs, and things like that. I am working on making my branching method more user-friendly though. When I thought of this, I was able to initially write the assembler in an evening. It made me wonder why I didn't do it years earlier. It has been so handy.


No, I don't have to. And you're right that a 6502 Forth assembler is quick and easy to write. It's very easy to convert a postfix assembler into a highly legible prefix assembler. The name conflicts are easy enough to handle; there are only 3 in my assembler: (, #, and AND. When the assembler is active I use AND to mean the 6502 instruction and & to mean the bit-wise operator. Of course, ( is optional -- in my assembler it does nothing -- but I think something like LDA ( 0 ,X) is the most legible. I use the usual prefix handling technique of not storing bytes until the next instruction, which means that the last instruction must be followed by something (usually an END-CODE) to store bytes of that final instruction. All in all, I've been very pleased with how my prefix assemblers have turned out.

However, with subroutine threaded code, inline assembly is available at any time with [ and ]. As I've gained experience, it has become clear to me that an ordinary postfix assembler is the more natural and elegant solution. With a prefix assembler, you need to finish the last instruction and I usually reset the assembler before the first instruction just in case (so it won't store bytes). When you first encounter Forth and postfix notation, you can either think "I'm not comfortable with postfix; I'm going to make Forth behave in a way that I'm more familiar with" or you can simply learn how to work with postfix. I think the decision to use a prefix or a postfix assembler with a subroutine-threaded Forth falls into a similar category.

kc5tja wrote:
APL, J, and K can be implemented with rather trivial interpreters, and still achieve excellent speeds.


But perception is going to be shaped by the most popular languages -- e.g. BASIC and Forth. Even the staunchest critics of Forth acknowledge that it is one of the fastest interpreted languages out there, so the evidence that interpretation can be fast is there. But not so many people realize (or want to admit) that these techniques can be applied to interpretation in general.

BASIC interpreters, on the other hand, tend to be slow. The emphasis of most seems to be the first word of their acronym -- Beginner's -- and I can't really fault slow interpretation when fast interpretation wasn't a goal in the first place. BASIC bears more of a resemblance to other programming languages -- superficially -- than Forth and its postfix nature. So the perception, unfair as it is, is that interpretation is slow, unless you use Forth, no matter how many exotic programming languages exist as counterexamples. A lot of people like to dismiss interpretation by saying that interpreters don't offer the benefits of a good compiler, even though it is equally true that compilers don't offer the benefits of a good interpreter.

kc5tja wrote:
Excellent! This is what I expected to see. I hadn't realized that this was already done. As far as I was aware, AppleBASIC for the Apple IIs didn't do this (at least, I don't remember them doing it), but then again, it was also floating point BASIC too, with the AppleDisk system. Not sure if that made any difference.


Applesoft, the later, and more well-known version of BASIC, was written by Microsoft (Apple Computer supplied the Apple-specific routines, though). It used floating point math, and its parser did little more than turn keywords into tokens. It was functional for the most part, but far from elegant. Applesoft was far more commonly used (due to the floating point math) and more extensively documented.

Integer BASIC, on the other hand, was replaced in ROM by Applesoft starting with the II+, the second machine in the Apple II line. The only official Integer BASIC documentation was a BASIC tutorial manual (which called it Apple BASIC, rather than Integer BASIC) that didn't even cover all of the commands! Interestingly, the only source code for Integer BASIC was handwritten and hand-assembled by Wozniak himself! (Evidence of this was apparent when the ROM was disassembled. For example, the code would jump from point A to point B back to point A just to insert a single instruction, whereas with an assembler, that instruction would be inserted in the intended place and the remaining code would be moved accordingly.) This was no small task, since size of the object code was about 5k. Ironically, Apple computer published a lot of their early source code, but the handwritten source code prevented a golden opportunity to share a unique approach. (A few adventurous types did some disassembly and published their findings in the form of magazine articles, so some of the internals of Integer BASIC eventually became known.)

As I alluded to in my earlier post, there were several places the efficiency of the Integer BASIC interpreter could be improved (and there were a handful of minor bugs), but the concept itself was quite elegant. I've done enough hand-assembly (nothing anywhere near that long, though) to cut anyone who assembles a 5k program by hand some slack in terms of efficiency.

Incidentally, both BASICs preceded the Apple disk drive. So the DOS took some fairly unconventional steps to be accessible from either of these ROM-based BASICs, neither of which was designed with existence of DOS in mind.

kc5tja wrote:
Bytecode/tokens is just "token-threaded" -- several Forth implementations have been implemented token-threaded. The most popular of which is OpenFirmware, actually.


I'm aware of token threaded Forths, though I haven't used one. On the processors I've worked with, address threading seems to be the more natural approach, so I haven't seen a compelling reason to implement a token threaded Forth.

kc5tja wrote:
However, with that said, in my experience, recompiling software in Forth is so fast that the application I just beckoned has already been compiled before my finger leaves the ENTER key (assuming your Forth compiler's dictionary lookups are sufficiently efficient). Therefore, the issue of addresses, literally, is a complete non-issue for me.


Compiler speed isn't the issue for me. On one Forth I've written, FIND is a high-level word. If I wanted blazing compilation speed, there are a lot of things I could do, but so far I haven't needed to. I've encountered some of the limitations of address threading, nothing terrible though. For BASIC, I just like the idea of move and insert more than edit and recompile. I wouldn't say addresses are a complete non-issue for me, but I would say they are a very minor issue.

kc5tja wrote:
What? Subroutine threading takes 3 bytes to store a reference to a word on the 6502, while both direct and indirect threading take 2. The numbers don't lie. Subroutine threading is smaller only for colon definitions with less than four words in the body. At 4 words, the overhead is the same, and beyond 4 words, ITC/DTC is smaller.


In rereading my earlier post, I probably wasn't very clear about what my main point was in terms of a size comparsion. Yes, a JSR (and a JMP) takes 3 bytes and a DW takes 2. There isn't anything you can do about that. However, just blindly substituting JSRs for DWs doesn't take full advantage of STC in terms of space or speed. There are ways to optimize STC that bring the total number of bytes in an STC Forth much closer to the total number of bytes in an ITC/DTC Forth than the 6/5 ratio (20 bytes vs. 24 bytes for an "average" high-level word) you suggest. That's was the point I was trying to make.

Here are some of the reason why STC is not as space inefficient as it seems, along with some potential space optimizations for the 6502. The code below assumes the typical 6502 data stack implementation (i.e. how FIG-Forth implements it), and (unlike FIG-Forth) separated heads (so that one STC -- and for that matter DTC -- primitive can fall through to another). All of the code is untested.

First, there are those things that take the same amount of space in STC as ITC/DTC. The size of a head doesn't depend on threading. ." String" will usually compile an inline counted string whose size doesn't depend on threading (the preceding JSR PDOTQ will of course be larger than a DW PDOTQ). This doesn't affect the difference in size between STC and ITC/DTC, but it does affect the ratio.

Second, at a speed penalty, you can implement constants and variables as a JSR followed by a DW which is the same as DTC and just 1 byte more than ITC (5 bytes vs. 4 bytes, rather than 3 vs. 2). Literals can also be implemented this way which is 5 bytes (STC) vs. 4 bytes (ITC/DTC). Again, this affects the ratio of sizes rather than the difference.

Third, there are high level STC constructs that take less space than their ITC/DTC counterparts. An unconditional branch (compiled by AHEAD or AGAIN) is 3 bytes (JMP destination) with STC, vs. 4 bytes (DW BRANCH,destination) with ITC/DTC. Likewise, a deferred word is 3 bytes (JMP address) with STC vs. 4 bytes (DW DO_DEFER,address) with ITC/DTC.

Fourth, since primitives take less space in STC than ITC/DTC, the ratio of the number of primitives to the number of high-level words affects the total size of STC vs. the total size of ITC. An STC primitive has 4 fewer bytes of overhead than an ITC primitive, so if an "average" high-level word is 4 bytes longer with STC than DTC, a simple 50/50 split of primitives and high level words makes the total size of DTC and ITC (roughly) equal (of course, there will be other factors).

Fifth, since STC has no NEXT, DO_COLON, or EXIT routine, there are that many fewer total bytes in an STC Forth. Counting the cold start code that prepares the zero page for the NEXT routine (i.e. storing a $6C at W-1), these routines take roughly 60 bytes in FIG-Forth (not counting any debugging code). That's 15 "average" STC high-level words just to break even.

Sixth, STC lends itself to optimizing primitives for space better than ITC/DTC. For example, you can implement the standard words 1-, ABS, NEGATE, and INVERT as follows:

Code:
ONEMINUS LDA 0,X
         BNE ONEM1
         DEC 1,X
ONEM1    DEC 0,X
         RTS
ABS      LDA 1,X
         BNE ABS1
NEGATE   JSR ONEMINUS
INVERT   JSR INV1
INV1     LDA #$FF
         EOR 0,X
         STA 0,X
         INX
ABS1     RTS


I can't see a way of doing this with ITC or DTC that doesn't take more than the usual 4 or 2 extra bytes of overhead. Also, note that even though these examples are optimized for space, in terms of speed, they still compare favorably to ITC and DTC that is specifically optimized for speed. Using self-modifying code also allows for some interesting space optimizations (though some of the space savings may be offset by having to copy the routine from ROM to RAM at cold start). For example, the words AND, OR, XOR, -, and + can be implemented as follows:

Code:
AND   LDA #$35
      BNE MATH
OR    LDA #$15
      BNE MATH
XOR   LDA #$55
      BNE MATH
MINUS JSR NEGATE
PLUS  CLC
      LDA #$75
MATH  STA MATH2
      JSR MATH1
MATH1 LDA 2,X
MATH2 ADC 0,X
      STA 2,X
      INX
      RTS


Again, note that these examples are not significantly slower than speed-optimized ITC or DTC.

Seventh, there are several relatively common high-level sequences that can be replaced by shorter inline assembly sequences. For example, with ITC/DTC, the sequence DUP 0< IF would typically compile to DW DUP,ZLT,QBRANCH,destination but with STC, it could be implemented in 4 fewer bytes with:

Code:
LDA 0,X
BPL label


Of course, this assumes that the branch destination is within half a page of the BPL instruction, but this will almost always be true in Forth. There are also sequences that can be converted to inline assembly that will take the same amount of space as the ITC/DTC high-level counterparts. For example, a DROP can compile to a pair of INX instructions, which takes only 2 bytes, just as an ITC/DTC DROP does. The sequence 0= IF which would ordinarily compile to DW ZEQ,QBRANCH,destination can be implemented in 6 bytes using STC also:

Code:
LDA 0,X
ORA 1,X
BNE label


Another example is ACCEPT. KEY is typically a deferred word, so that different input sources can easily be selected. Since KEY is deferred, I usually write ACCEPT as a high-level word (usually with very few editing features). In my subroutine threaded Forth, after KEY, I use inline assembly to put the keystroke in the accumulator and use a CMP immediate followed by a BEQ or BNE to handle keystrokes that require special action (and switching back to high level Forth when convenient). The CMP-branch is only 4 bytes which is fewer bytes than a literal (or constant) followed by = IF or by OF (i.e. a CASE structure), so several bytes are saved this way (plus it's much faster).

Eighth, one high-level word can fall through to another, which will save 3 bytes. For example, with ITC, the high level definitions : z a b ; : y c z ; compiles to:

Code:
z: DW DO_COLON ; JSR for DTC
   DW a
   DW b
   DW EXIT

y: DW DO_COLON ; JSR for DTC
   DW c
   DW z
   DW EXIT


Even if you rearrange the assembly so that y precedes z, there is no way for y to fall through to z. In many Forths, replacing DW z,EXIT with DW BRANCH,z+2 (z+3 for DTC) will be faster, but the size is still the same. With STC, the definitions compile to:

Code:
z: JSR a
   JMP b

y: JSR c
   JMP z


If y is moved before z, the assembly will be:

Code:
y: JSR c
   JMP z
z: JSR a
   JMP b


and the JMP z can be eliminated.

Ninth, the definition : z BEGIN a WHILE b REPEAT ; usually is not optimal because the unconditional branch (compiled by REPEAT) will be executed multiple times. By rearranging the definition to : z AHEAD BEGIN b [ 1 CS-ROLL ] THEN a UNTIL ; the unconditional branch is executed just once. Since there is no DO_COLON routine with STC, the AHEAD can be eliminated (saving 3 bytes) as follows:

Code:
zee0: JSR b
zee:  JSR a
      JSR QBRANCH ; prepare Z flag and pop data stack
      BEQ zee0
      RTS


In other words, the address returned by ' z is zee, not zee0. In my subroutine threaded Forth, the >NUMBER definition uses this technique. Once again, it's also faster.

Finally, I should point out that using these optimizations does not necessary mean that the compiler must be more sophisticated (and therefore larger). The alternative is to optimize manually by inserting the optimizations yourself. However, adding a few short definitions helps and also makes the code more readable. Most Forth compilers do very little optimization, leaving it up to the programmer to know the most efficient way of doing things. For example, it is up to the programmer to use the single word 1- instead of the two word sequence 1 -.

Of course, there are things that will take more space with STC (manipulating the return stack is one example), but the bottom line is with STC, there are a lot of options available if you wish to optimize for space.

kc5tja wrote:
I don't think anyone was contesting this. I know I certainly wasn't. But it nonetheless is true that STC will consume more space (for a 16-bit Forth implementation on a 6502) than a DTC implementation for the average program size. The average number of words in a colon definition turns out to be 8 in most Forth programs, so each definition, one can estimate, will have four bytes of "overhead" to them compared to other methods. Times the number of definitions, and you can see that it adds up...slightly.


Replacing DWs with a JSR (or JMP) is simply translating from ITC/DTC to STC, and I don't think you'll get as much out of an STC by translating ITC/DTC as you would by rewriting it specifically for STC. The usual response as to why there aren't any programs that translate C to Forth and/or vice versa is that translating a well-written C program to Forth tends to produce a poor Forth code, and translating a well-written Forth program to C tends to produce poor C code. I believe this is also true of ITC/DTC and STC, albeit to a far lesser degree (I would say replace "poor" with "not as good"). The differences between colorForth and ANS Forth suggest to me that a slightly different approach is necessary (and I believe Chuck Moore realized this) to produce the best possible STC, regardless of whether it's space or speed you're after.

Based on some of the optimizations listed above, I think there is even a chance that it's possible to write a STC Forth that is actually smaller than a ITC or DTC Forth on the 6502, given some reasonable ground rules, and assuming that both have the same functionality and both were optimized strictly for space. It'd be quite a challenge, though.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Sep 08, 2004 5:18 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
dclxvi wrote:
kc5tja wrote:
Bytecode/tokens is just "token-threaded" -- several Forth implementations have been implemented token-threaded. The most popular of which is OpenFirmware, actually.


I'm aware of token threaded Forths, though I haven't used one. On the processors I've worked with, address threading seems to be the more natural approach, so I haven't seen a compelling reason to implement a token threaded Forth.


The point is compactness and uniformity of representation. When expressing the original source code is not feasible (e.g., as with OpenFirmware drivers, which exists in relatively small ROMs on the expansion card they accompany), but binary independence and economy are requirements, then token-threading is ideal. Like I said, this is why OpenFirmware uses it.

For what it's worth, most OpenFirmware environments will run-time recompile token-threaded representation into a more native and faster executing form.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Sep 08, 2004 7:23 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Sheesh! And I thought I and Samuel were long-winded! That was great though. Thanks for the detailed examples and points of clarification. I guess I can quit saying STC is less memory-efficient. I hope you'll have opportunities to recycle your "book" and get more uses out of it for your time. Although I'm not going to start over on my 65816 Forth because of it, it does give some ideas for future improvements.

I did "cheat" a little though and do some optimizations in my '816 ITC Forth that are similar to what you're talking about. For example, nested structures sometimes result in two or more consecutive branches. Why not just branch to the intended destination the first time. The only disadvantage I could see for some of these things is that if the user wants to use SEE to examine the insides, there could be some confusing results. SEE terminates when it comes to the unnest at the end of a secondary; but there won't necessarily be an unnest in some cases, especially if one word branches right into another without going through unnest-nest or other such overhead. I could probably also expand my inventory of (probably headerless) words that replace occurrences of things like the DUP 0< IF that you mention. I know it's taboo in Forth, but there's nothing that says the code is supposed to be readable after being de-compiled, and we are after all talking about internal optimizations. If the user wants it readable, he'll have to look at the source code I provide that shows what it looks like before these optimizations are carried out.

Something I found so helpful with the '816 is that since it can handle 16 bits at a time, a lot of words that would have taken so much extra space if defined as primitives on the '02 became easy with the '816, resulting not just in faster code but even shorter and easier to write in many cases—the best of all worlds. Because of this, my '816 Forth has around 350 primitives. Having so much higher percentage of the kernel in primitives dramatically reduces the number of times NEXT, nest, and unnest get executed to carry out a particular job, further speeding things up.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Sep 08, 2004 7:35 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Since we've hijacked pbug's "How to make a BASIC compiler?" long enough now and quickly turned it into Forth material that's way too advanced for those who are new to Forth, why don't we continue this thread under the Forth section of the forum. I'll start a new subject and call it "cont. from BASIC compiler". I am looking forward to more.

<later> Ok, it's at http://www.6502.org/forum/viewtopic.php?p=3336#3336
pbug, you can go back to talking about BASIC compilers. I for one will still be interested—even if I do think you'd be better off going to Forth ;)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Sep 29, 2004 5:04 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
It occured to me that no one has mentioned the BASIC compiler for FIG-Forth in Forth Dimensions. Early issues of Forth Dimensions are online at www.forth.org in the form of .pdf files. The article appears on p. 178 (the .pdf page number, p. 175 is printed at the bottom of the page itself) of the Volume 3 .pdf file (~12MB!). There is also a related article on converting infix to postfix on p. 14 of the Volume 4 Number 6 .pdf file (~3MB). There are also source code files around, so you don't have to type it in from the article.

Anyway, you can use the FIG-Forth source code in the (6502.org) Repository with the Forth Dimensions article to get an example of a BASIC compiler for the 6502. It's the same concept as converting infix to postfix at parse time that I mentioned in an earlier post. The BASIC program (to be compiled) will seem to be spaced a little strangely as compared to a traditional BASIC, because the compiler uses the ordinary FIG-Forth parsing method, i.e. space delimited words. You could write your own parser if you wish to change this. Also, it will compile BASIC to (indirect threaded) Forth, but it could be altered to compile to assembly (or to translate the Forth into assembly).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Aug 21, 2010 8:40 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Just an update here: it seems the forth.org website no longer has direct links to the FD scans.

You can still find them at the direct URL:

http://www.forth.org/fd/contents.html


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Aug 23, 2010 10:46 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Ruud wrote:

I'm working on a Pascal compiler but one written in Pascal. The idea is that, once compiled on the PC, the user can improve and compile it on the native machine.

But I must confess there is a little snag in it: the executable so far is over 200 KB and I don't think it will ever be small enough to run it on a C64, 8032, Apple ][ or any other 6502 machine. The Apple2GS or a C64/128 with SuperCPU (=65816) could cope with it.


Have you looked at the P4 compiler?
http://homepages.cwi.nl/~steven/pascal/

It was used as the basis of the UCSD Pascal compiler, which certainly ran on the 6502.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Aug 23, 2010 10:50 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Also, don't forget that early compilers couldn't fit into available core in the early days even on "big machines" like the PDP series. C is an example, where you had a master driver program, cc, which in turn invoked c1, c2, and as in that order, to lex and parse, then to generate code for, then to assemble (respectively) the binary you're creating.

If you can split your one executable into multiple components, or even somehow implement an overlay system, that will allow you to run your compiler on the C64 natively.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Aug 23, 2010 10:59 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
dclxvi wrote:
My BASIC was partly inspired by Apple (Integer) BASIC, which was written by Steve Wozniak. Wozinak used ideas from HP BASICs (he had worked at HP), and of course HP was well-known for their stack-oriented RPN calculators. "Integer" BASIC is really not all that far away from the address interpreter you suggest. It uses a sophisticated parser which does all of it's syntax checking at parse time rather than execution time. It also converts constants at parse time to their twos complement reprepresentation rather than storing them as ASCII. Therefore, unlike many BASICs (e.g. EhBASIC and Applesoft) you never got a SYNTAX ERROR at run time, and you weren't converting constants (e.g. the statment PRINT 99) or line numbers (e.g. the statement IF A < B THEN 10) each and every time they were encountered.

There were other things that could have been handled at parse time, but weren't, such as checking for too many parentheses in an expression, and determining whether you needed the address of a variable so you could store a value in it (e.g. a LET or an INPUT statement) or if only the contents of the variable were needed (e.g. PRINT A). (Determining the latter at parse time would have sped up expression evaluation considerably.)


There is also Atari BASIC, which contains a lot of these same ideas:
http://users.telenet.be/kim1-6502/6502/absb.html

Although it isn't perfect either:
http://www3.sympatico.ca/maury/other_stuff/atari_basic.html


Last edited by teamtempest on Tue Aug 24, 2010 12:16 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Aug 24, 2010 12:07 am 
Offline

Joined: Wed May 20, 2009 1:06 pm
Posts: 491
teamtempest wrote:
There is also Atari BASIC, which contains a lot of these same ideas:
http://users.telenet.be/kim1-6502/6502/absb.html

Although it is perfect either:
http://www3.sympatico.ca/maury/other_stuff/atari_basic.html


The Atari Basic Source Book

http://users.telenet.be/kim1-6502/6502/absb.html


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Aug 27, 2010 12:29 pm 
Offline
User avatar

Joined: Fri Dec 12, 2003 7:22 am
Posts: 259
Location: Heerlen, NL
teamtempest wrote:
Have you looked at the P4 compiler?

The original post was 6 years ago. In the mean time I have been pointed to this compiler various times :)
But thanks anyway!

_________________
Code:
    ___
   / __|__
  / /  |_/     Groetjes, Ruud
  \ \__|_\
   \___|       URL: www.baltissen.org



Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 31, 2018 1:29 pm 
Offline

Joined: Sun Aug 26, 2018 7:33 pm
Posts: 4
i was trying to help extending Boriel’s ZX-Basic Compiler to 6502 - my humble efforts are at https://gitlab.com/nitrofurano/6502basiccompiler - the official Boriel’s ZX-Basic Compiler git repository is at https://bitbucket.org/zxbasic/zxbasic - whatever help on this is indeed hugely appreciated! :) (just imagine, an ansi-basic cross-compiler coded in Python, able to run on whatever actual operating system, like GNU/Linux, BSD or MacOS-X)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: