6502.org • View topic - question on shunting yard algorithm in expresion parsing

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

question on shunting yard algorithm in expresion parsing

Page 1 of 1

[ 11 posts ]

Previous topic | Next topic

Author

Message

Jeff_Birt

Post subject: question on shunting yard algorithm in expresion parsing

Posted: Sat Sep 22, 2018 9:43 pm

Joined: Wed Jul 18, 2018 12:12 pm
Posts: 96

I had done a bit of research on the shunting yard algorithm and a few days later it occurred to me that a BASIC interpreter must, each time an expression is evaluated, go through this process, i.e. it is not saving the post fix version of the expression it builds (and actually operates on.)

I just did a bit more looking tonight and did not see this explicitly stated so either I'm wrong or it is plainly obvious to everyone but me

I edited the above after it was pointed out my first attempt was nonsensical. For reference the shunting yard algorithm is a method of parsing a infix expression and driving a postfix expression to operate on. https://en.wikipedia.org/wiki/Shunting-yard_algorithm

Top

drogon

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Sun Sep 23, 2018 9:16 am

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland

It's when the lines between being a "pure" interpreter and a compiler/virtual machine start to blur.

Traditionally, interpreted languages parse every time at run-time. This is easy, if somewhat (relatively) slow, but speed and memory usage can be improved with a tokenising system - at the expense of a slightly more complex interpreter (which then becomes both tokeniser/syntax checker and "virtual machine" interpreter).

Fast-forward to today and some interpreted languages have "just in time" - so tokenise at entry/load time, then execute but store the shunted yard RPN somewhere, so the next time that expression is executed, you just pull the RPN out of some storage area and interpret that. You take the shunting hit the first time, but after that you get the advantage - of what's essentially a smart cache.

Go back in-time again and remember that the machines running BASIC then were short of memory, cpu cycles, operating systems, etc. and while things like shunting yard existed, having the luxury of storing/caching every expression just wasn't possible. Dijkstra wrote shunting yard in '61 - BASIC came to us in '64 ... (and for the record, I'm a kid of '62)

I don't think many BASICs use shunting yard either, however the net-effect might be the same - cache the evaluated expression in some way and use it next time - if you can afford the memory.

Fast forward to this decade and some years back I wrote what I considered to be my ideal BASIC interpreter and the current iteration does that, and many other tricks to speed up execution of the tokenised code. So for example, I turn numbers into their binary representation, keep a symbol table of variable names (so name length becomes irrelevant to execution speed), cache all procedure and function start line addresses (so you can goto or call a function anywhere without any speed penalty) and I save the shunted yard output of expressions, so the next time they're executed it comes out of a cache. My BASIC was written on a 32-bit Linux machine in C but I am sketching plans to port it (or a subset of it) to a 6502 system, so there is a little bit of relevance.

Doing all this does make the LIST command somewhat interesting though...

And one thing to note - I didn't initially write the code to do the "JIT"/RPN Cache, (premature optimisation and all that), but it was always in the back of my mind - I only implemented it about 18 months ago and after extensive profiling it didn't perform as well as expected - my profiling showed it was spending some 45-55% of it's execution time doing the shunting - the net speed-up after caching was barely 40% - still significant but there is room for improvement.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/

Top

Jeff_Birt

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Sun Sep 23, 2018 1:31 pm

Joined: Wed Jul 18, 2018 12:12 pm
Posts: 96

Thanks Gordon. I was thinking of interpreted languages in general and specifically thinking about older, i.e. 1980's tech. I had read that the BBC micro used a different approach than the shunting yard algorithm but thought it might be the most well known for comparison sake.

I guess that even on older systems they could have 'compiled' the BSIC source to pre evaluate the expressions and save that as a separate executable file but then trying to debug that on such resource constrained machines might have been difficult. If the run-time error checking came back with 'error in line 100' you would have to open your source again and work on it and then recompile. I guess though that one could have run the 'non-compiled' version for debugging and then compiled it at the end of the process. Still, that would have required more ROM code to understand both systems and it might not have been practical. It is interesting to see how the design of Forth bypasses some of these issues.

Top

dmsc

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Sun Sep 23, 2018 2:35 pm

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138

Hi!

drogon wrote:

At least all Atari BASICs and derivatives tokenize on input (including numbers and variables) and only store full tokenized source and use the shunting yard algorithm on execution, I think this is pretty common.

Quote:

Fast forward to this decade and some years back I wrote what I considered to be my ideal BASIC interpreter and the current iteration does that, and many other tricks to speed up execution of the tokenised code. So for example, I turn numbers into their binary representation, keep a symbol table of variable names (so name length becomes irrelevant to execution speed), cache all procedure and function start line addresses (so you can goto or call a function anywhere without any speed penalty) and I save the shunted yard output of expressions, so the next time they're executed it comes out of a cache. My BASIC was written on a 32-bit Linux machine in C but I am sketching plans to port it (or a subset of it) to a 6502 system, so there is a little bit of relevance.

In my FastBasic (see https://github.com/dmsc/fastbasic#readme ) I use a PEG parser (not shunting yard), the compiler is embedded into the parser, and all the expressions are compiled to a stack-based byte-code on run. For space reasons (the editor+compiler+interpreter is 8K), the parser is also stored as (different) byte-code, so the 6502 compiler is really an interpreter for the parser byte-code.

This also means that the cross-compiler is generated from the same parsing byte-code, this guarantees that the native compiler and cross-compiler produce the same code.

AS for "LIST", in FastBasic the full text is stored in the Editor, and completely parsed on each RUN, so it is not needed. As full parsing can be slow, the editor also supports writing a compiled program to disk.

Top

resman

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Sun Sep 23, 2018 5:29 pm

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 145
Location: Lake Tahoe

The HP-85 tokenized the BASIC expressions into an RPN format. It was able to list them back out in algebraic format. Check out the "Enhanced BASIC Language for a Personal Computer" section on the HP Journal: http://www.hpl.hp.com/hpjournal/pdfs/Is ... 980-07.pdf

Dave...

Top

BigDumbDinosaur

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Mon Sep 24, 2018 2:53 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8507
Location: Midwestern USA

drogon wrote:

What you have described has existed since the early 1970s. The dialect of timesharing BASIC called "Business BASIC" does all that, and then some. Two modern examples of Business BASIC are BBx and Thoroughbred. Both evolved from MAI Basic 4, which had all of the features you mentioned back in the days of bit-slice processors, such as the AMD AM2900 series, and removable disk packs.

Amazingly, 16 concurrent users could be supported on the MAI minis of the mid-seventies, with only 8KB per user available—and a whole lot of disk swapping going on. :shock:

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

whartung

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Mon Sep 24, 2018 5:31 pm

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004

resman wrote:

This is the fundamental conflict surrounding expression evaluation.

When you can have different representations of the same code, things get much easier. But when the sole representation of the code is the tokenized stream, thats where the problems begin.

The Shunting Yard algorithm is nice because of its constrained use of stacks in contrast to recursive systems which capture not just expression state, but also stack frames of the callers (if nothing else, they have to capture the return address of the JSR). But, as has been noted, recursive algorithms have been applied in other BASICs.

So, it's straight forward to apply the SY algorithm to the literal tokenized expression in place without having to rewrite at tokenization time. It's also straightforward to parrot the tokenized stream back out when LISTing the program lines, vs rebuilding an infix expression from the RPN.

Conservation of RAM seemed to be a key driver in the early BASICs. Which is one reason it's pretty surprising that on the PDP and it's BASIC-PLUS, they DID NOT do that. They indeed kept the text as well as the compiled form (they did compile their language into a stack based p-code, vs simple tokenization). But the early PDPs suffered from similar memory constraints that early micros did.

Top

MichaelM

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Mon Sep 24, 2018 6:27 pm

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL

Just asking, did the PDP Basic p-code get implemented as a threaded code interpreter? The initial Fortran IV compiler relied on threaded code to conserve memory at a respectable performance penalty.

_________________
Michael A.

Top

whartung

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Mon Sep 24, 2018 11:26 pm

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004

According to Wikipedia, the original version of BASIC-PLUS just used a P-Code, but BASIC-PLUS 2 compiled in to a threaded type of system.

I didn't know that about Fortran IV. My only experience with Fortran under RSTS was as a driver for a simple Assembly language routine that we had to write for my assembly class.

But, honestly, however much that decision punished performance, I simply have to say "don't look a gift horse in the mouth". Back then, be amazed things worked at all. I have to assume that the DEC engineers that wrote the implementation got the drive for memory conservation from their user base (directly or indirectly) in making those design decisions.

Top

MichaelM

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Mon Sep 24, 2018 11:55 pm

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL

Thanks for that. I don't know why I didn't think to go to Wikipedia myself. :oops:

But your response did bring back some repressed memories about RSTS and Basic-Plus-2. To think that little roll of yellow paper tape with a copy of my simulation of the van der Pol equation in Basic-Plus-2 was my first threaded code program; that little roll of tape is now 36+ years old and still has a machine oil smell.

_________________
Michael A.

Top

dmsc

Post subject: Re: question on shunting yard algorithm in expresion parsing

Posted: Tue Sep 25, 2018 3:25 am

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138

Hi!

whartung wrote:

resman wrote:

IMHO, it is the reverse: it is easier to rebuild the infix expression from the RPN than to evaluate the infix: you simply concatenate the strings instead of avaluating the results, adding parenthesis as needed.

An example:

Code:

 OP       STACK
             ""
  1          "1"
  A          "1"   "A"
  +          "1 + A"
  3          "1 + A"  "3"
  *          "(1 + A) * 3"

Note that the size of the stack is always shorter or equal to the stack needed for the shunting-yard, as you are exactly reversing the algorithm, but removing extra parenthesis (for example, "3 * (((((1+2)))))" uses stack slots for all the "(", those are removed in the RPN representation).

Quote:

Conservation of RAM seemed to be a key driver in the early BASICs. Which is one reason it's pretty surprising that on the PDP and it's BASIC-PLUS, they DID NOT do that. They indeed kept the text as well as the compiled form (they did compile their language into a stack based p-code, vs simple tokenization). But the early PDPs suffered from similar memory constraints that early micros did.

And this is the reason Atari BASIC tokenized on input: tokenized binaries are smaller than the full text, so you can enter longer programs in limited RAM.

Top

Page 1 of 1

[ 11 posts ]

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 16 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum