Turing Tarpit Programming Challenge

Dr Jefyll · Post by **Dr Jefyll** » Thu Aug 11, 2022 2:37 pm

Martin_H wrote:

@Dr Jefyll, I thought at the circuit level there were only binary adders, and subtraction with borrow required inverters for one of the inputs.

Right. Binary adders (74283) are what're used here. And yes one of the input data paths is inverted but it happens further upstream, not right at the '283 inputs. Here's a cropped, lores version in which I've highlighted the relevant sections. For an overall block diagram see the web page.

-- Jeff

pjdennis · Post by **pjdennis** » Sat Aug 13, 2022 2:49 am

Martin_H wrote:

Here's a link to the [subroutine threaded] code: https://github.com/Martin-H1/6502/blob/ ... infstc.asm

The following are my timings for printing the Sierpinski triangle with my three versions of the BF interpreters:

Code: Select all

299 seconds - text based interpreter that scans the source.
222 seconds - direct threaded code interpreter similar to many Forth implementations.
148 seconds - subroutine threaded code for the win!

I was curious how much of an improvement could be had by making the BF compiler generate inline code in place of the subroutine calls. Here's a link to the resulting code which is a fairly straightforward translation of Martin's subroutine threaded version: https://github.com/pjdennis/6502/blob/m ... infitc.asm

Surprisingly (to me at least) this new version is approximately 4 times faster than the subroutine threaded version (2.5 seconds vs 10 seconds on my machine, or 3.37 million cycles vs. 13.23 million cycles in the emulator.) This goes to show how much overhead there is for subroutine calls, especially relative to short routines. A trade-off is the size of the generated code which is around double, although the code size of the compiler itself is smaller (623 vs. 760 bytes.)

A next step could be introducing optimizations, e.g. consolidating repeated BF instructions. How much faster could it go?

Martin_H · Post by **Martin_H** » Sat Aug 13, 2022 4:14 pm

@pjdennis, that's an awesome improvement and I wish I had thought of it! I may add your version to my repository. At this point it's so fast I wonder if it merits a better user interface to allow an interactive mode. At the very least I should run the BF version of game of life and see how fast that is.

pjdennis · Post by **pjdennis** » Mon Aug 15, 2022 3:55 pm

Martin_H wrote:

@pjdennis, that's an awesome improvement and I wish I had thought of it! I may add your version to my repository. At this point it's so fast I wonder if it merits a better user interface to allow an interactive mode. At the very least I should run the BF version of game of life and see how fast that is.

Feel free; I sent a 'pull request'. I also made another version which consolidates >1 consecutive occurrences of the <>+- commands into a single addition, resulting in an additional ~50% increase in speed for the Sierpinski triangle example along with a reduction in the size of the generated code: https://github.com/pjdennis/6502/blob/b ... infotc.asm.

Martin_H · Post by **Martin_H** » Mon Aug 15, 2022 4:15 pm

pjdennis wrote:

Feel free; I sent a 'pull request'. I also made another version which consolidates >1 consecutive occurrences of the <>+- commands into a single addition, resulting in an additional ~50% increase in speed for the Sierpinski triangle example along with a reduction in the size of the generated code: https://github.com/pjdennis/6502/blob/b ... infotc.asm.

Great. I merged your pull request and will take a look.

Also, when you mentioned optimization I immediate thought of the same thing, namely consolidating multiple increment or decrement commands into their higher level operations. Out of curiosity I did a frequency analysis of some BF programs and fond the following:

Code: Select all

[('+', 193), ('>', 131), ('<', 103), ('-', 91), ('[', 65), (']', 65), ('\n', 26), ('.', 19), (',', 0)]

The big four are exactly the ones you optimized. Consecutive branch optimization probably isn't worth it.

Martin_H · Post by **Martin_H** » Tue Aug 16, 2022 4:13 pm

I built a REPL for the first brainf--- compiler and uploaded the first draft to GitHubin its own directory:
https://github.com/Martin-H1/6502/tree/master/Brainfast
I called it Brainfast because it's the fastest 6502 BF compiler I've seen. I separated the Py65 code and plan to add a module for real hardware.

While pasting text into it I found I had to keep hitting the return key until all of it was processed. A Google search shows this is a known issue on Py65:
https://github.com/mnaberez/py65/issues/47

I'm on a recent Linux Mint build and installed Py65 from the package management. So it looks like that bug is still present. My Windows machine's WiFi died, so I can't download and test on it. I am pretty sure that Py65 bug isn't present on Windows.

In any event, I plan to port the optimization build over and create a main module for real hardware. But it is good wear right now so I am going to enjoy the outdoors.

Sheep64 · Post by **Sheep64** » Mon Nov 21, 2022 12:25 pm

Martin_H on Tue 9 Aug 2022 wrote:

disallowing the X and Y registers allowed from more complexity than a machine like the PDP-8.

This prompts an idea for a small virtual machine. NAND2Tetris builds an ALU with seven(?) inputs. Many of the permutations are useless or quirky. If this process is applied to a virtual machine, it would greatly compact the typical case statement within a loop implementation. Test one bit to determine special instruction. Special instructions use a conventional case statement. The remainder would be a series of tests to implement fragments of ALU operation. The can be achieved with a series of ASL zp // BCC sequences. Performance would be awful but it may be possible to make a virtual machine which is 128 bytes or less. My motive for a compact virtual machine is to make cross-platform binaries. People have previously complained that 1KB virtual machine per processor architecture is too large.

Given that performance is already a lost cause and size is minimal, a variant without RegX or RegY is also feasible. However, the loss of JMP (abs,X) is dearly missed.

Martin_H · Post by **Martin_H** » Tue Nov 22, 2022 9:11 pm

Thanks for mentioning NAND2Tetris. I googled it and found this online course.
https://www.nand2tetris.org/
I am busy with the holiday, but it is definitely something I should look into next week.

As far as your VM and performance is concerned. The lesson the Brainf--- compiler taught me is that a simple VM with a JIT is much easier than I thought. Granted it has to be written in assembly to pull off the code generation games we played. But it really was as simple as copy and pasting object code coupled with fixing up jump target addresses. You would not meet your size target, but the performance should be near native code speed.

pjdennis · Post by **pjdennis** » Tue Nov 22, 2022 9:51 pm

Martin_H wrote:

Thanks for mentioning NAND2Tetris. I googled it and found this online course.
https://www.nand2tetris.org/

There's also a derivative, https://nandgame.com/ which takes a graphical approach to the hardware design within the web browser. I found both of them interesting.

Martin_H · Post by **Martin_H** » Wed Nov 23, 2022 2:33 pm

pjdennis wrote:

There's also a derivative, https://nandgame.com/ which takes a graphical approach to the hardware design within the web browser. I found both of them interesting.

Thanks for the link, but now I am addicted to it.

pjdennis · Post by **pjdennis** » Wed Nov 23, 2022 4:30 pm

Martin_H wrote:

Thanks for the link, but now I am addicted to it.

You're welcome

Martin_H · Post by **Martin_H** » Sun Nov 27, 2022 9:33 pm

In the nand game I solved all the hardware puzzles and I am now on their assembly puzzles. I have to say this machine is one of the crudest architectures I have programmed, and definitely fits the description of a Turning Tarpit. It has only two registers, and only one of them can address memory, or use load immediate. There's no JSR or stack. The ALU only supports add, subtract, increment, decrement, and, or, complement. It doesn't support bit shifting which is a notable omission.

One interesting feature is that any instruction can branch either absolutely or conditionally. That's kinda useful given how much else is missing.

Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge

Re: Turing Tarpit Programming Challenge