6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu May 09, 2024 4:32 am

All times are UTC




Post new topic Reply to topic  [ 27 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Thu Aug 11, 2022 2:37 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
Martin_H wrote:
@Dr Jefyll, I thought at the circuit level there were only binary adders, and subtraction with borrow required inverters for one of the inputs.
Right. Binary adders (74283) are what're used here. And yes one of the input data paths is inverted but it happens further upstream, not right at the '283 inputs. Here's a cropped, lores version in which I've highlighted the relevant sections. For an overall block diagram see the web page.

-- Jeff


Attachments:
DiabCPU schem 5clr annotated lores.png
DiabCPU schem 5clr annotated lores.png [ 133.32 KiB | Viewed 1114 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 13, 2022 2:49 am 
Offline

Joined: Fri Apr 15, 2022 1:56 pm
Posts: 45
Location: San Antonio, TX, USA
Martin_H wrote:
Here's a link to the [subroutine threaded] code: https://github.com/Martin-H1/6502/blob/master/FunStuff/brainfstc.asm

The following are my timings for printing the Sierpinski triangle with my three versions of the BF interpreters:
Code:
299 seconds - text based interpreter that scans the source.
222 seconds - direct threaded code interpreter similar to many Forth implementations.
148 seconds - subroutine threaded code for the win!
I was curious how much of an improvement could be had by making the BF compiler generate inline code in place of the subroutine calls. Here's a link to the resulting code which is a fairly straightforward translation of Martin's subroutine threaded version: https://github.com/pjdennis/6502/blob/master/FunStuff/brainfitc.asm

Surprisingly (to me at least) this new version is approximately 4 times faster than the subroutine threaded version (2.5 seconds vs 10 seconds on my machine, or 3.37 million cycles vs. 13.23 million cycles in the emulator.) This goes to show how much overhead there is for subroutine calls, especially relative to short routines. A trade-off is the size of the generated code which is around double, although the code size of the compiler itself is smaller (623 vs. 760 bytes.)

A next step could be introducing optimizations, e.g. consolidating repeated BF instructions. How much faster could it go?


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 13, 2022 4:14 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
@pjdennis, that's an awesome improvement and I wish I had thought of it! I may add your version to my repository. At this point it's so fast I wonder if it merits a better user interface to allow an interactive mode. At the very least I should run the BF version of game of life and see how fast that is.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2022 3:55 pm 
Offline

Joined: Fri Apr 15, 2022 1:56 pm
Posts: 45
Location: San Antonio, TX, USA
Martin_H wrote:
@pjdennis, that's an awesome improvement and I wish I had thought of it! I may add your version to my repository. At this point it's so fast I wonder if it merits a better user interface to allow an interactive mode. At the very least I should run the BF version of game of life and see how fast that is.
Feel free; I sent a 'pull request'. I also made another version which consolidates >1 consecutive occurrences of the <>+- commands into a single addition, resulting in an additional ~50% increase in speed for the Sierpinski triangle example along with a reduction in the size of the generated code: https://github.com/pjdennis/6502/blob/bf-enhancements/FunStuff/brainfotc.asm.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2022 4:15 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
pjdennis wrote:
Feel free; I sent a 'pull request'. I also made another version which consolidates >1 consecutive occurrences of the <>+- commands into a single addition, resulting in an additional ~50% increase in speed for the Sierpinski triangle example along with a reduction in the size of the generated code: https://github.com/pjdennis/6502/blob/bf-enhancements/FunStuff/brainfotc.asm.

Great. I merged your pull request and will take a look.

Also, when you mentioned optimization I immediate thought of the same thing, namely consolidating multiple increment or decrement commands into their higher level operations. Out of curiosity I did a frequency analysis of some BF programs and fond the following:
Code:
[('+', 193), ('>', 131), ('<', 103), ('-', 91), ('[', 65), (']', 65), ('\n', 26), ('.', 19), (',', 0)]

The big four are exactly the ones you optimized. Consecutive branch optimization probably isn't worth it.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2022 4:13 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
I built a REPL for the first brainf--- compiler and uploaded the first draft to GitHubin its own directory:
https://github.com/Martin-H1/6502/tree/master/Brainfast
I called it Brainfast because it's the fastest 6502 BF compiler I've seen. I separated the Py65 code and plan to add a module for real hardware.

While pasting text into it I found I had to keep hitting the return key until all of it was processed. A Google search shows this is a known issue on Py65:
https://github.com/mnaberez/py65/issues/47

I'm on a recent Linux Mint build and installed Py65 from the package management. So it looks like that bug is still present. My Windows machine's WiFi died, so I can't download and test on it. I am pretty sure that Py65 bug isn't present on Windows.

In any event, I plan to port the optimization build over and create a main module for real hardware. But it is good wear right now so I am going to enjoy the outdoors.


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 21, 2022 12:25 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
Martin_H on Tue 9 Aug 2022 wrote:
disallowing the X and Y registers allowed from more complexity than a machine like the PDP-8.


This prompts an idea for a small virtual machine. NAND2Tetris builds an ALU with seven(?) inputs. Many of the permutations are useless or quirky. If this process is applied to a virtual machine, it would greatly compact the typical case statement within a loop implementation. Test one bit to determine special instruction. Special instructions use a conventional case statement. The remainder would be a series of tests to implement fragments of ALU operation. The can be achieved with a series of ASL zp // BCC sequences. Performance would be awful but it may be possible to make a virtual machine which is 128 bytes or less. My motive for a compact virtual machine is to make cross-platform binaries. People have previously complained that 1KB virtual machine per processor architecture is too large.

Given that performance is already a lost cause and size is minimal, a variant without RegX or RegY is also feasible. However, the loss of JMP (abs,X) is dearly missed.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 22, 2022 9:11 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
Thanks for mentioning NAND2Tetris. I googled it and found this online course.
https://www.nand2tetris.org/
I am busy with the holiday, but it is definitely something I should look into next week.

As far as your VM and performance is concerned. The lesson the Brainf--- compiler taught me is that a simple VM with a JIT is much easier than I thought. Granted it has to be written in assembly to pull off the code generation games we played. But it really was as simple as copy and pasting object code coupled with fixing up jump target addresses. You would not meet your size target, but the performance should be near native code speed.


Top
 Profile  
Reply with quote  
PostPosted: Tue Nov 22, 2022 9:51 pm 
Offline

Joined: Fri Apr 15, 2022 1:56 pm
Posts: 45
Location: San Antonio, TX, USA
Martin_H wrote:
Thanks for mentioning NAND2Tetris. I googled it and found this online course.
https://www.nand2tetris.org/
There's also a derivative, https://nandgame.com/ which takes a graphical approach to the hardware design within the web browser. I found both of them interesting.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 23, 2022 2:33 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
pjdennis wrote:
There's also a derivative, https://nandgame.com/ which takes a graphical approach to the hardware design within the web browser. I found both of them interesting.

Thanks for the link, but now I am addicted to it.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 23, 2022 4:30 pm 
Offline

Joined: Fri Apr 15, 2022 1:56 pm
Posts: 45
Location: San Antonio, TX, USA
Martin_H wrote:
Thanks for the link, but now I am addicted to it.
You're welcome :)


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 27, 2022 9:33 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
In the nand game I solved all the hardware puzzles and I am now on their assembly puzzles. I have to say this machine is one of the crudest architectures I have programmed, and definitely fits the description of a Turning Tarpit. It has only two registers, and only one of them can address memory, or use load immediate. There's no JSR or stack. The ALU only supports add, subtract, increment, decrement, and, or, complement. It doesn't support bit shifting which is a notable omission.

One interesting feature is that any instruction can branch either absolutely or conditionally. That's kinda useful given how much else is missing.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 27 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: