6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Tue May 21, 2024 12:42 am

All times are UTC




Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sat Jul 23, 2022 6:59 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
In another thread commodorejohn and I were discussing how the 6502's zero page addressing is vaguely similar to the PDP-8 and CDC-160. Both were single accumulator machines that relied on a page zero indirect addressing mode as additional "registers". For example, neither had any index registers, and couldn't load an address into the accumulator.

This got me thinking about what 6502 assembly would look like if Chuck Peddle took that approach for the 6502, and he omitted the X and Y registers. One way to find out was to write code and explicitly avoid their use, so my Turing Tarpit challenge was born.

After nearly completing Conway's "Game of Life" I had a brain storm. What if I wrote a Turing complete interpreter using a minimal number of instructions? I could "cheat" and write all programs using that set of instructions. The obvious candidate is the esoteric programming language Brain F--k.

So here's my no X or Y register BF interpreter which includes some samples to test it:
https://github.com/Martin-H1/6502/blob/master/FunStuff/brainf.asm

Note that it's slow, but that's because it brute force scans the source text forward and backward during loops, rather than tokenizing the program with references to instruction pointers. I also used a JSR and RTS in two places and inlining could eliminate those instructions as well.

Update 1: I removed the link to my GitHub repo and embedded the original code directly. That's because I plan to delete the original and replace it with the threaded version.

Update 2: I went back to the link as I plan to keep both versions, and this will clutter the post less.


Last edited by Martin_H on Sat Aug 06, 2022 1:46 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 23, 2022 7:21 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
Congratulations on the achievement... but is BF really the obvious candidate? Perhaps it is, but it shouldn't be!

(You probably know that Game of Life is Turing complete, so if you'd managed that, it would be job done! And enormously less usable, even, than your language of choice.)


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 23, 2022 7:39 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
BigEd wrote:
(You probably know that Game of Life is Turing complete, so if you'd managed that, it would be job done!

Yeah, that's why I started with the Game Of Life. I plan to finish it after this digression.

BF has the advantage that C to BF compilers exist, so there's a crazy amount of samples available for more typical CS problems.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jul 23, 2022 8:15 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
I wonder... OPC6 supports a BCPL port, so it could possibly be an interesting little virtual machine. Not a tarpit as such. It's a small machine but might be a little bit of work to write an emulation in XY-free 6502. I would think writing the emulation would be the main task - being X and Y free probably not too much of an extra burden.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 24, 2022 10:49 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
(Sorry, didn't mean to derail your announcement and challenge!)


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 24, 2022 11:54 am 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
No worries.

I would be curious to see other people's take on programming a 6502 using a reduced instruction set. Partly because of how the PDP-8 was able to get away with so few instructions.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jul 24, 2022 12:17 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
It feels almost like a few macros would sort out the missing X and Y. I haven't tried it, and neither have I read your tarpit code, but with the single trick of feeling free to modify code which will shortly be executed I think we can do the job. Something like the following perhaps:

X and Y as counters - we can just use zero page
abs,X addressing mode - we can add up two numbers and construct an absolute instruction
indirect addressing - again, construct an absolute instruction then execute it
indexing and indirection combined - perform both of the above, with a possible cost saving if we can construct a zero page instruction.

There is of course a cost, in speed and density. (Or to put it another way, including these addressing modes offers improvements to speed and density.)


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 04, 2022 4:29 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
I've fallen into a digression on a digression. I really didn't like how my BF interpreter processed the program as text and scanned to find the branch locations at runtime. Such scanning was constantly redoing effort that could be done once!

As a complete digression I rewrote the BF interpreter to work more like Forth. It compiles the source text into a list of subroutine addresses and optional operands to those subroutines. By using a data stack I store the branch target iptr and pass that as an operand at runtime. The execute phase then cycles through this table of addresses doing a jump indirect. This makes execution much simpler and hopefully faster.

Using the Sierpinski triangle as a benchmark I've concluded that a threaded code approach is about 17% faster than processing the program textually. I was honestly hoping for something better, but I imagine loading words (rather than bytes) coupled with indirect jumps (rather than branches) eats into the gains from pre-computing branch targets.

In any event here's the source to the threaded code BF interpreter:
https://github.com/Martin-H1/6502/blob/master/FunStuff/brainfdtc.asm

Update: I did some code tweaking and the code is now about 20% faster.


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 06, 2022 8:23 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
My digression on a digression continues.

I was unhappy with the performance boost from direct threaded code, and I realized the overhead of the NEXT function was costing me cycles. I looked at the FIG Forth source, and cycle counting showed my version was in the ballpark. This is a common problem in Forth interpreters, and some go to great lengths to optimize NEXT. But it's possible to eliminate NEXT entirely by switching to subroutine threaded code. At that point the hardware PC becomes your instruction pointer. Plus, it's easy to do by emitting a JSR in the code stream followed by the thread address, and then using RTS instead of JMP NEXT in the threaded code. There were some minor but solvable difficulty in getting a pointer to the parameters to adjust the PC for branching.

Since it's 90 degrees outside, I'm inside doing some coding, and it worked like a champ. Here's a link to the code:
https://github.com/Martin-H1/6502/blob/master/FunStuff/brainfstc.asm

The following are my timings for printing the Sierpinski triangle with my three versions of the BF interpreters:
Code:
299 seconds - text based interpreter that scans the source.
222 seconds - direct threaded code interpreter similar to many Forth implementations.
148 seconds - subroutine threaded code for the win!

So how does this relate to my original challenge? Well none of these implementations uses the X or Y register and they all work fine. Also, my NEXT function was only slightly slower than the FIG Forth version, even through FIG Forth uses the Y register. So it proves my thesis that the index registers are nice to have, but not critical. It's page zero indirect addressing that's critical! Although self-modifying code might allow you to even avoid that.

Plus changing the algorithm produced far greater gains in performance than I could have achieved by not restricting my register usage. But that's old wisdom when it comes to programming.


Last edited by Martin_H on Sat Aug 06, 2022 8:31 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 06, 2022 8:25 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
Nice result! And it's good to have a quantification of the effects of these design decisions.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 09, 2022 8:02 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
Thanks BigEd.

I returned to my Game of Life program and decided to sink deeper into the tarpit. Only disallowing the X and Y registers allowed from more complexity than a machine like the PDP-8. To get a better feel for what it would be like to program an early discrete components machine, I needed to disallow even more instructions.

The PDP-8 had eight memory instructions, an I/O instruction, NOP, and eight accumulator bit manipulation instructions. Like the 6502 the PDP-8's accumulator wasn't wide enough to hold an address, so it used pages with addressing modes like page zero indirect. The link register (carry bit) was the only processor status bit, so no zero, negative, or overflow bits and associated branch instructions. But there was a test for zero in the accumulator and branch instruction.

So to increase the Turing Tarpit challenge difficulty I restrict myself to the following 65c02 instructions and addressing modes.
Code:
Accumulator operations: and, ora, eor, adc, clc, inc, ror, and rol
Memory operations: lda #, lda absolute, lda (), sta absolute, and sta ()
Flow control instructions: bcc, bcs, beq, bne, jmp absolute, jmp (), jsr absolute, rts, and nop

That's 16 instructions, which is a notable reduction from the 6502's 56, and matches the PDP-8. I'm allowing absolute addressing because the 6502 is a byte machine and one or two byte operands are intrinsic to such a design. I'm also allowing the zero control bit because the PDP-8 had skip on zero.

Notable missing instructions are CMP, DEC, PHA, PLA, and SBC. Subtraction is done by adding the two's complement, and precomputed negative quantities. CMP is a nondestructive subtract, so you have to subtract and test for zero. DEC is achieved by adding negative one. This was actually common on early machines with a discrete logic ALU. The PDP-8 didn't have a stack, so I won't allow its use for anything other than subroutine calls.

I rewrote my game of life using the rules, and it works, but is really slow. I imagine that's because comparison requires doing math twice. Once to subtract a negative, and again to return a result. In any case here's the 6502 calculating like it's 1965:
https://github.com/Martin-H1/6502/blob/master/FunStuff/life.asm

Update: BigEd correctly pointed out that SEC is unnecessary without subtract, and ASL can be done via add. So that's sixteen instructions.


Last edited by Martin_H on Wed Aug 10, 2022 12:41 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 10, 2022 6:34 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10802
Location: England
I'm relatively sure some very early machines offered subtract only, rather than add only - it's easy, if you have subtract, to do an add, but not vice versa.

With subtract, the SEC is handy, whereas with add, the CLC is handy. I'm not sure you need both.

ROR is handy, but I think ROL and ASL can be done by doubling, more or less, and your add, or subtract, can manage that. Maybe keep ROL but ditch ASL - that would give you a round number of 16 instructions!


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 11, 2022 11:25 am 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
BigEd wrote:
I'm relatively sure some very early machines offered subtract only, rather than add only - it's easy, if you have subtract, to do an add, but not vice versa.

I'm not sure about early machines in general, but I do know that the PDP-8 only had a TAD (two's complement add) instruction. It also had CIA (complement increment accumulator) which two's complemented the contents of the accumulator. By omitting subtract they saved hardware complexity and cost by shifting it into software.

BigEd wrote:
With subtract, the SEC is handy, whereas with add, the CLC is handy. I'm not sure you need both.

ROR is handy, but I think ROL and ASL can be done by doubling, more or less, and your add, or subtract, can manage that. Maybe keep ROL but ditch ASL - that would give you a round number of 16 instructions!

You're right and I updated my list of instructions. I actually didn't need any bit operations for this project, but I included them on the list because eventually you would need them.


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 11, 2022 1:27 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
BigEd wrote:
I'm relatively sure some very early machines offered subtract only, rather than add only
Yes, and the list of "very early machines" even includes some embedded applications! Diablo daisy-wheel printers had a processor built entirely with TTL (ie, no microprocessor), and I had a client that used a fleet of Diablo model 1345's in their manufacturing operation in the late 1980's and well into the 1990's.

Based on the schematic diagram (below) I was able to reverse engineer the instruction set, and it is painfully limited (ie, only ten instructions). In that respect its context seems relevant to this thread.

And Ed's remark is applicable, as there is no "plain vanilla" ADD instruction. Only two approximations are available.
  • There's an instruction that'll fetch a register, add-with-carry an immediate constant hard-coded in the program and, then store the result back to the register; and,
  • there's an instruction that'll fetch a register, fetch a second register and invert the data, add-with-carry, and then store the result back to the first register. Because of the inversion, this instruction is essentially a subtract (with borrow, if carry is clear). To accomplish a register to register addition you need to firstly create an inverted copy of one of the operands and then subtract.


Martin_H wrote:
I actually didn't need any bit operations for this project, but I included them on the list because eventually you would need them.
The firmware for the Diablo relied heavily on bit operations, and there were several registers which each contained only a collection of unrelated bit fields. But (as with addition), things weren't entirely straightforward. :roll:

There was no logical OR instruction, so the only way to set a bit in a register was to use the instruction (noted above) which adds a hard-coded constant to the register. But of course that's inappropriate if the bit happens to already be set! :shock:

So, if you wanted a certain bit set, you'd first test that bit to see if it's already set... and then do a conditional branch! (ie, take no action if it's already set; otherwise add $01, $02, $04, $08 or whatever you need in order to set that particular bit! :lol:

I got completely sucked down the rabbit hole, and eventually ended up reverse engineering the firmware and then hacking it. And yes it'd be fair to call this a Tarpit programming challenge! (Why hack the original firmware? The second half of this page describes the goals (although not the actual code). And there's a page about the processor itself here.)

-- Jeff


Attachments:
DiabCPU schem 5clr.gif
DiabCPU schem 5clr.gif [ 355.19 KiB | Viewed 1145 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Last edited by Dr Jefyll on Thu Aug 11, 2022 2:16 pm, edited 1 time in total.
Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 11, 2022 2:16 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 575
@Dr Jefyll, I thought at the circuit level there were only binary adders, and subtraction with borrow required inverters for one of the inputs. I recall when I looked at the 6502 logical diagram that's how it was arranged.

BTW, thanks for posting that information about the IC based processor. Microprocessors used to be magic to me until I started reading stuff like that. Then you see what's going on under the hood.

Someday I may try building my own CPU, but I would definitely keep it simple and put all the burden in software. So a Turing Tarpit would likely be the result.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: