Claude and I Vibe Coded a Forth Interpreter

Martin_H · Post by **Martin_H** » Sun Mar 15, 2026 3:37 am

Slight change of approach. There's a lot of cut and pasted code throughout the file primitives.s for working with the UART. For example, in the same function if it needs to print two characters, it will cut and paste the UART code. One function had it five times! I need to fix this duplication

So, I created a hal.inc and hal_mench.s which copied my serial I/O functions to it. I also created stubs for several other useful functions. I then removed all UART code from primitives.s and replaced it with HAL calls. The good news is the module shrunk by several hundred lines. The bad news is it still doesn't assemble.

My next move is to move the code that doesn't compile out of primitives.s to a temporary file and leave stubs in primitives.s to maintain the dictionary links. That way I will have a skeleton of a Forth system. I will also be able to create a binary and see how far it will execute before falling over. I will then target that function for a fix.

barnacle · Post by **barnacle** » Sun Mar 15, 2026 6:52 am

Asking as a complete 'AI' skeptic: might it not have been easier to have started from scratch? You seem to be rewriting it all anyway...

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Mar 15, 2026 11:26 am

barnacle wrote:

Asking as a complete 'AI' skeptic: might it not have been easier to have started from scratch? You seem to be rewriting it all anyway...

Funny how warped minds think alike.

I was tempted to ask the same thing. With the amount of messing around that has already been done to fix Claude’s clumsy coding calamities, scratch-writing likely would have taken no more time, and would have resulted in a functional program...or at least less-illogical code.

However, this exercise isn’t one of total futility. It is a graphic demonstration of the limitations of artificial “intelligence.” BTW, good thing Claude isn’t flying an airplane; he’d have you landing in a forest. Oh wait! That’s already happened. Eh, Airbus?

Martin_H · Post by **Martin_H** » Sun Mar 15, 2026 11:59 am

barnacle wrote:

Asking as a complete 'AI' skeptic: might it not have been easier to have started from scratch? You seem to be rewriting it all anyway...

Answering that question is why I decided to try this. I am a skeptic LLM AI technology, but there's so much hype I needed to see if I was being close minded.

BigDumbDinosaur wrote:

However, this exercise isn’t one of total futility. It is a graphic demonstration of the limitations of artificial “intelligence.” BTW, good thing Claude isn’t flying an airplane; he’d have you landing in a forest. Oh wait! That’s already happened. Eh, Airbus?

That's why I'm documenting it. I'm not sure if my little adventure will reach a big enough audience though.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Mar 15, 2026 12:37 pm

Martin_H wrote:

BigDumbDinosaur wrote:

However, this exercise isn’t one of total futility. It is a graphic demonstration of the limitations of artificial “intelligence.” BTW, good thing Claude isn’t flying an airplane; he’d have you landing in a forest. Oh wait! That’s already happened. Eh, Airbus?

That's why I'm documenting it. I'm not sure if my little adventure will reach a big enough audience though.

You might be surprised. References to 6502.org seem to turn up more often than not in a lot of computer-oriented search engine results, especially when current topics such as LLM AI are involved.

For me, your adventures with Claude are a source of amusement; some of the ridiculous code Claude conjures would be comical to anyone who knows the 65C816 assembly language. Mostly what Claude seems to be offering is superannuated 6502 code. He doesn’t seem to know the 65C816 programming idiom.

Martin_H · Post by **Martin_H** » Sun Mar 15, 2026 4:52 pm

BigDumbDinosaur wrote:

For me, your adventures with Claude are a source of amusement; some of the ridiculous code Claude conjures would be comical to anyone who knows the 65C816 assembly language. Mostly what Claude seems to be offering is superannuated 6502 code. He doesn’t seem to know the 65C816 programming idiom.

Amusement of myself and others is definitely another goal. I agree, Claude does not understand the 65816. Claude's comments complained about register starvation, but stack relative addressing would solve that. Stack relative addressing is a property of most instruction set architectures that Claude could have generalized on and used. But even its 65c02 knowledge is dodgy as I keep finding this illegal addressing mode error everywhere:

Code: Select all

	stz (SCRATCH0)      ; STATE = 0

Martin_H · Post by **Martin_H** » Sun Mar 15, 2026 5:54 pm

I just found this Claudism:

Quote:

HEADER '.""', DOTQUOTE_CFA, F_IMMEDIATE, DOTHEX_CFA

It's the ." word, and it's trying to escape the quote by using single quotes like this is Perl. According to ca65 docs this should work:

Quote:

HEADER ".\"", DOTQUOTE_CFA, F_IMMEDIATE, DOTHEX_CFA

Unfortunately, that runs into a ca65 error: "stubs.s:166: Error: Newline in string constant". This means .", s", and abort" words will need their headers hand crafted. Not a big problem as the macro is otherwise useful.

Claude also has a problem maintaining consistency between files. The file dictionary.inc defines all the code word CFA's as global symbols to allow referencing them in other modules. But DOT_PROMPT_CFA is missing from the list. No big deal, except DOT_PROMPT_CFA is also the last word in the dictionary list, so it's the one that makes it important.

Good news! I'm down to 17 errors.

Martin_H · Post by **Martin_H** » Sun Mar 15, 2026 8:17 pm

Claude finally realized that the 65816 has stack relative addressing and even has indirection. Woot woot!

Code: Select all

	lda (1,S)           ; peek

Unfortunately, that's not quite right, as that's not a supported addressing mode. The correct way to do it is:

Code: Select all

	phy
	ldy #0
	lda (1,S),Y           ; peek
	ply

Claude also invented the useful JSR indirect instruction:

Code: Select all

	jsr    (SCRATCH0)      ; Call primitive (it will NEXT)

Never mind, that's not on the silicon either. The correct way to do that is:

Code: Select all

	jsr @jsri
	bra    @next_word
@jsri:	jmp (SCRATCH0)      ; Call primitive (it will NEXT)
@next_word:

This branch covers everything:

Code: Select all

	beq    @next_word      ; Interpreting: number on stack, done
	; Compiling: compile LIT + value
	; ... compile steps here
	bra    @next_word

Claude also invented the new BLE and "CMP Y" instructions. WDC missed out as an instruction that compares the accumulator and Y register would be useful.

With these fixes the whole thing assembles, it might even link. But there's no way it would run.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Mar 15, 2026 8:31 pm

Martin_H wrote:

Claude finally realized that the 65816 has stack relative addressing and even has indirection. Woot woot!

Code: Select all

	lda (1,S)           ; peek

Unfortunately, that's not quite right, as that's not a supported addressing mode.

As I said, be glad Claude isn’t flying the plane on which you are traveling.

Quote:

Claude also invented the useful JSR indirect instruction:

Code: Select all

	jsr    (SCRATCH0)      ; Call primitive (it will NEXT)

That would be useful to some extent, but, in my opinion, not enough to justify gobbling up an opcode.

Quote:

Claude also invented the new BLE and "CMP Y" instructions. WDC missed out as an instruction that compares the accumulator and Y register would be useful.

BLE? And here I thought I ordered a BLT.

Martin_H · Post by **Martin_H** » Sun Mar 15, 2026 9:55 pm

Let's take time to reflect on what we've seen and think about next steps. The original Claude generated source contained three source files. One of which was over 3300 lines of code.

Code: Select all

forth.s,		300 lines
macros.inc,	121 lines
primitives.s, 3343 lines

This is likely an artifact of Claude using Fig-Forth as a model to generate source. I'm not a fan of Fig-Forth's run-on structure, but it has the advantage of actually working. Plus, it was developed in a different era where development tools weren't as polished. There's no way to proceed with such a large block of untested code. So, I refactored it into smaller modules and gave each a score on code quality:

Code: Select all

filename,	size,	   score
compare.s,	 296 lines, B - untested but looks good after revision/review
forth.s,		216 lines, B - untested but looks good after revision/review
hal_mench.s,  171 lines, A - reused code from another project.
interpreter.s, 414 lines, B - untested but looks good after revision/review
io.s,			149 lines, B - untested but looks good after revision/review
macros.inc,	182 lines, B - untested but looks good after revision/review
math.s,		 387 lines, B - untested but looks good after revision/review
memory.s,	  214 lines, B - untested but looks good after revision/review
primitives.s, 316 lines, F - Unsalvageable code, needs a rewrite.
print.s,		151 lines, A - reused code from another project.
stack.s,		285 lines, B - untested but looks good after revision/review
stubs.s,		208 lines, F - Claude generated place holders no implementation
system.s,	  748 lines, D - Possibly salvageable code, needs work.

So how did Claude do?

If I graded Claude's code before code review it would be an F. If Claude was a junior engineer working with me, it would require a lot of mentorship. But some of it was a good starting point and looks good after code review. So, I will be generous and grade the revised code. But I will only grade Claude's modules in the weighted average constructed as follows: weighted average = SUM (lines * score) / total lines where B = 3, C = 2, D = 1, F = 0 which yields:

(296*3 + 216*3 + 414*3 + 149*3 + 182*3 + 387*3 + 214*3 + 316*0 + 285*3 + 208*0 + 748*1) / (296 + 216 + 414 + 149 + 182 + 387 + 214 + 316 + 285 + 208 + 748)

Which evaluates to ... Wait a second, this is a Forth sub-forum, it should be:

Code: Select all

296 3 * 216 3 * 414 3 * 149 3 * 182 3 * 387 3 * 214 3 * 316 0 * 285 3 * 208 0 * 748 1 *  ok 12
+ + + + + + + + + +  ok 2
296 216 414 149 182 387 214 316 285 208 748 + + + + + + + + + +  ok 3
.s <3> 2 7177 3415  ok 3
/  ok 2
. 2  ok 1

That's a C which is not a good score. On the one hand it's amazing a Markov chain on steroids can code at all. But if I knew going in the code would be C quality, I wouldn't have spent my time on this. However, I now how some solid B modules to build on, so I might continue with those.

barnacle · Post by **barnacle** » Mon Mar 16, 2026 8:14 am

The fundamental point here, I think, is that it's not generating code at all. It's generating a string of symbols which look like code, based on the weighted averages of everything it's been trained on. Most of the code on the internet appears to be 'why doesn't this work' (including my own!) rather than 'here is an empirically correct method to...', which rather suggests that you're going to get code built out of queries.

It doesn't in _any_ way 'know' anything. It can't; it's not intelligent. The only thing it knows is that certain symbols are often found close proximity to others. It's producing something that looks like code because that's all the algorithm _can_ do. It doesn't have a concept of the structures and algorithms which a human might use, even to the extent of checking whether a particular instruction is in the instruction set of the desired processor, or whether it has desired effect. As witnessed by the number of invented instructions... it's seen things that look like them, so it spits them out again. Badly.

Seen on the BBC site this morning: https://www.bbc.com/news/articles/cj0d6el50ppo

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Mar 16, 2026 2:10 pm

barnacle wrote:

The fundamental point here, I think, is that it's not generating code at all.

That, in a nutshell, is exactly what is going on. What is being produced is a bunch of random rubbish.

Quote:

It doesn't in _any_ way 'know' anything. It can't; it's not intelligent. The only thing it knows is that certain symbols are often found close proximity to others.

The best way I can describe what is being seen is Claude is emitting the results of a series of pattern matches, pattern-matching being a feature of computers (more specifically, a feature of the work of human programmers) that has long predated current technology.

At best, AI is pouring old wine into new bottles, all-the-while spilling some of it, getting different vintages mixed up, and making a mess. The significant “improvement” is the computers doing the re-bottling are far faster than the mightiest mainframes of ye olden days. That said, it’s still a dumb machine that is offering gibberish to Martin.

Quote:

Seen on the BBC site this morning: https://www.bbc.com/news/articles/cj0d6el50ppo

Declarations like "Proudly Human"...

If someone has to declare that, there is something drastically wrong with our society.

Martin_H · Post by **Martin_H** » Mon Mar 16, 2026 4:41 pm

barnacle wrote:

Most of the code on the internet appears to be 'why doesn't this work' (including my own!) rather than 'here is an empirically correct method to...', which rather suggests that you're going to get code built out of queries.

Claude's training data consisting of broken code from Stack Overflow explains a lot.

Martin_H · Post by **Martin_H** » Tue Mar 17, 2026 11:57 am

I compiled and linked the interpreter, but I know it won't work. Moreover, the soundness of it is so poor that testing code in place is impossible. The solution to this problem is unit testing, so let's start alphabetically with compare.s! Likes all of Claude's code, the compare.s module is 300 lines of untested code. So, I created a tests subdirectory and wrote compareTest.s which is 490 lines! Running the test, I find two bugs I missed during code review. Both were caused by Claude doing an operation, dropping an item off the stack, and then testing the condition code!

Code: Select all

	sbc 0,X			; a - b
	inx			; drop b
	inx
	bvs @overflow		; Overflow-aware signed compare

After fixing these bugs I now have a working compare.s module. But Claude's contribution to that success is now much less than 50%!

The stack.s module (285 lines) seems like low hanging fruit because it's operations like dup, swap, drop, rot, etc. So, I just finished writing stackTest.s (360 lines) and plan to repeat this success.

Martin_H · Post by **Martin_H** » Tue Mar 17, 2026 2:26 pm

Claude, you disappointed me, but more importantly, I disappointed myself for not catching this.
This code has an issue which is minor or one serious depending upon your POV.

Code: Select all

PUBLIC DEPTH_CODE
	stx SCRATCH0		; compute (PSP_INIT - x) / 2
	lda #$03FF
	sec
	sbc SCRATCH0
	lsr			; Divide by 2 (cells)
	dex
	dex
	sta 0,X			; push to TOS
	NEXT
ENDPUBLIC

A magic number for the PSP_INIT value is an attractive nuisance if you ever change it. Unfortunately, Claude uses magic numbers, even after defining constants for them.

Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter

Re: Claude and I Vibe Coded a Forth Interpreter