Claude and I Vibe Coded a Forth Interpreter
Re: Claude and I Vibe Coded a Forth Interpreter
Slight change of approach. There's a lot of cut and pasted code throughout the file primitives.s for working with the UART. For example, in the same function if it needs to print two characters, it will cut and paste the UART code. One function had it five times! I need to fix this duplication
So, I created a hal.inc and hal_mench.s which copied my serial I/O functions to it. I also created stubs for several other useful functions. I then removed all UART code from primitives.s and replaced it with HAL calls. The good news is the module shrunk by several hundred lines. The bad news is it still doesn't assemble.
My next move is to move the code that doesn't compile out of primitives.s to a temporary file and leave stubs in primitives.s to maintain the dictionary links. That way I will have a skeleton of a Forth system. I will also be able to create a binary and see how far it will execute before falling over. I will then target that function for a fix.
So, I created a hal.inc and hal_mench.s which copied my serial I/O functions to it. I also created stubs for several other useful functions. I then removed all UART code from primitives.s and replaced it with HAL calls. The good news is the module shrunk by several hundred lines. The bad news is it still doesn't assemble.
My next move is to move the code that doesn't compile out of primitives.s to a temporary file and leave stubs in primitives.s to maintain the dictionary links. That way I will have a skeleton of a Forth system. I will also be able to create a binary and see how far it will execute before falling over. I will then target that function for a fix.
Re: Claude and I Vibe Coded a Forth Interpreter
Asking as a complete 'AI' skeptic: might it not have been easier to have started from scratch? You seem to be rewriting it all anyway...
Neil
Neil
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Claude and I Vibe Coded a Forth Interpreter
barnacle wrote:
Asking as a complete 'AI' skeptic: might it not have been easier to have started from scratch? You seem to be rewriting it all anyway...
However, this exercise isn’t one of total futility. It is a graphic demonstration of the limitations of artificial “intelligence.” BTW, good thing Claude isn’t flying an airplane; he’d have you landing in a forest. Oh wait! That’s already happened. Eh, Airbus?
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Claude and I Vibe Coded a Forth Interpreter
barnacle wrote:
Asking as a complete 'AI' skeptic: might it not have been easier to have started from scratch? You seem to be rewriting it all anyway...
BigDumbDinosaur wrote:
However, this exercise isn’t one of total futility. It is a graphic demonstration of the limitations of artificial “intelligence.” BTW, good thing Claude isn’t flying an airplane; he’d have you landing in a forest. Oh wait! That’s already happened. Eh, Airbus? 
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Claude and I Vibe Coded a Forth Interpreter
Martin_H wrote:
BigDumbDinosaur wrote:
However, this exercise isn’t one of total futility. It is a graphic demonstration of the limitations of artificial “intelligence.” BTW, good thing Claude isn’t flying an airplane; he’d have you landing in a forest. Oh wait! That’s already happened. Eh, Airbus? 
For me, your adventures with Claude are a source of amusement; some of the ridiculous code Claude conjures would be comical to anyone who knows the 65C816 assembly language. Mostly what Claude seems to be offering is superannuated 6502 code. He doesn’t seem to know the 65C816 programming idiom.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Claude and I Vibe Coded a Forth Interpreter
BigDumbDinosaur wrote:
For me, your adventures with Claude are a source of amusement; some of the ridiculous code Claude conjures would be comical to anyone who knows the 65C816 assembly language. Mostly what Claude seems to be offering is superannuated 6502 code. He doesn’t seem to know the 65C816 programming idiom.
Code: Select all
stz (SCRATCH0) ; STATE = 0
Re: Claude and I Vibe Coded a Forth Interpreter
I just found this Claudism:
It's the ." word, and it's trying to escape the quote by using single quotes like this is Perl. According to ca65 docs this should work:
Unfortunately, that runs into a ca65 error: "stubs.s:166: Error: Newline in string constant". This means .", s", and abort" words will need their headers hand crafted. Not a big problem as the macro is otherwise useful.
Claude also has a problem maintaining consistency between files. The file dictionary.inc defines all the code word CFA's as global symbols to allow referencing them in other modules. But DOT_PROMPT_CFA is missing from the list. No big deal, except DOT_PROMPT_CFA is also the last word in the dictionary list, so it's the one that makes it important.
Good news! I'm down to 17 errors.
Quote:
HEADER '.""', DOTQUOTE_CFA, F_IMMEDIATE, DOTHEX_CFA
Quote:
HEADER ".\"", DOTQUOTE_CFA, F_IMMEDIATE, DOTHEX_CFA
Claude also has a problem maintaining consistency between files. The file dictionary.inc defines all the code word CFA's as global symbols to allow referencing them in other modules. But DOT_PROMPT_CFA is missing from the list. No big deal, except DOT_PROMPT_CFA is also the last word in the dictionary list, so it's the one that makes it important.
Good news! I'm down to 17 errors.
Re: Claude and I Vibe Coded a Forth Interpreter
Claude finally realized that the 65816 has stack relative addressing and even has indirection. Woot woot!
Unfortunately, that's not quite right, as that's not a supported addressing mode. The correct way to do it is:
Claude also invented the useful JSR indirect instruction:
Never mind, that's not on the silicon either. The correct way to do that is:
This branch covers everything:
Claude also invented the new BLE and "CMP Y" instructions. WDC missed out as an instruction that compares the accumulator and Y register would be useful.
With these fixes the whole thing assembles, it might even link. But there's no way it would run.
Code: Select all
lda (1,S) ; peek
Code: Select all
phy
ldy #0
lda (1,S),Y ; peek
ply
Code: Select all
jsr (SCRATCH0) ; Call primitive (it will NEXT)
Code: Select all
jsr @jsri
bra @next_word
@jsri: jmp (SCRATCH0) ; Call primitive (it will NEXT)
@next_word:
Code: Select all
beq @next_word ; Interpreting: number on stack, done
; Compiling: compile LIT + value
; ... compile steps here
bra @next_word
With these fixes the whole thing assembles, it might even link. But there's no way it would run.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Claude and I Vibe Coded a Forth Interpreter
Martin_H wrote:
Claude finally realized that the 65816 has stack relative addressing and even has indirection. Woot woot!
Unfortunately, that's not quite right, as that's not a supported addressing mode.
Code: Select all
lda (1,S) ; peekUnfortunately, that's not quite right, as that's not a supported addressing mode.
Quote:
Claude also invented the useful JSR indirect instruction:
Code: Select all
jsr (SCRATCH0) ; Call primitive (it will NEXT)Quote:
Claude also invented the new BLE and "CMP Y" instructions. WDC missed out as an instruction that compares the accumulator and Y register would be useful.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Claude and I Vibe Coded a Forth Interpreter
Let's take time to reflect on what we've seen and think about next steps. The original Claude generated source contained three source files. One of which was over 3300 lines of code.
This is likely an artifact of Claude using Fig-Forth as a model to generate source. I'm not a fan of Fig-Forth's run-on structure, but it has the advantage of actually working. Plus, it was developed in a different era where development tools weren't as polished. There's no way to proceed with such a large block of untested code. So, I refactored it into smaller modules and gave each a score on code quality:
So how did Claude do?
If I graded Claude's code before code review it would be an F. If Claude was a junior engineer working with me, it would require a lot of mentorship. But some of it was a good starting point and looks good after code review. So, I will be generous and grade the revised code. But I will only grade Claude's modules in the weighted average constructed as follows: weighted average = SUM (lines * score) / total lines where B = 3, C = 2, D = 1, F = 0 which yields:
(296*3 + 216*3 + 414*3 + 149*3 + 182*3 + 387*3 + 214*3 + 316*0 + 285*3 + 208*0 + 748*1) / (296 + 216 + 414 + 149 + 182 + 387 + 214 + 316 + 285 + 208 + 748)
Which evaluates to ... Wait a second, this is a Forth sub-forum, it should be:
That's a C which is not a good score. On the one hand it's amazing a Markov chain on steroids can code at all. But if I knew going in the code would be C quality, I wouldn't have spent my time on this. However, I now how some solid B modules to build on, so I might continue with those.
Code: Select all
forth.s, 300 lines
macros.inc, 121 lines
primitives.s, 3343 lines
Code: Select all
filename, size, score
compare.s, 296 lines, B - untested but looks good after revision/review
forth.s, 216 lines, B - untested but looks good after revision/review
hal_mench.s, 171 lines, A - reused code from another project.
interpreter.s, 414 lines, B - untested but looks good after revision/review
io.s, 149 lines, B - untested but looks good after revision/review
macros.inc, 182 lines, B - untested but looks good after revision/review
math.s, 387 lines, B - untested but looks good after revision/review
memory.s, 214 lines, B - untested but looks good after revision/review
primitives.s, 316 lines, F - Unsalvageable code, needs a rewrite.
print.s, 151 lines, A - reused code from another project.
stack.s, 285 lines, B - untested but looks good after revision/review
stubs.s, 208 lines, F - Claude generated place holders no implementation
system.s, 748 lines, D - Possibly salvageable code, needs work.
If I graded Claude's code before code review it would be an F. If Claude was a junior engineer working with me, it would require a lot of mentorship. But some of it was a good starting point and looks good after code review. So, I will be generous and grade the revised code. But I will only grade Claude's modules in the weighted average constructed as follows: weighted average = SUM (lines * score) / total lines where B = 3, C = 2, D = 1, F = 0 which yields:
(296*3 + 216*3 + 414*3 + 149*3 + 182*3 + 387*3 + 214*3 + 316*0 + 285*3 + 208*0 + 748*1) / (296 + 216 + 414 + 149 + 182 + 387 + 214 + 316 + 285 + 208 + 748)
Which evaluates to ... Wait a second, this is a Forth sub-forum, it should be:
Code: Select all
296 3 * 216 3 * 414 3 * 149 3 * 182 3 * 387 3 * 214 3 * 316 0 * 285 3 * 208 0 * 748 1 * ok 12
+ + + + + + + + + + ok 2
296 216 414 149 182 387 214 316 285 208 748 + + + + + + + + + + ok 3
.s <3> 2 7177 3415 ok 3
/ ok 2
. 2 ok 1
Re: Claude and I Vibe Coded a Forth Interpreter
The fundamental point here, I think, is that it's not generating code at all. It's generating a string of symbols which look like code, based on the weighted averages of everything it's been trained on. Most of the code on the internet appears to be 'why doesn't this work' (including my own!) rather than 'here is an empirically correct method to...', which rather suggests that you're going to get code built out of queries.
It doesn't in _any_ way 'know' anything. It can't; it's not intelligent. The only thing it knows is that certain symbols are often found close proximity to others. It's producing something that looks like code because that's all the algorithm _can_ do. It doesn't have a concept of the structures and algorithms which a human might use, even to the extent of checking whether a particular instruction is in the instruction set of the desired processor, or whether it has desired effect. As witnessed by the number of invented instructions... it's seen things that look like them, so it spits them out again. Badly.
Seen on the BBC site this morning: https://www.bbc.com/news/articles/cj0d6el50ppo
Neil
It doesn't in _any_ way 'know' anything. It can't; it's not intelligent. The only thing it knows is that certain symbols are often found close proximity to others. It's producing something that looks like code because that's all the algorithm _can_ do. It doesn't have a concept of the structures and algorithms which a human might use, even to the extent of checking whether a particular instruction is in the instruction set of the desired processor, or whether it has desired effect. As witnessed by the number of invented instructions... it's seen things that look like them, so it spits them out again. Badly.
Seen on the BBC site this morning: https://www.bbc.com/news/articles/cj0d6el50ppo
Neil
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Claude and I Vibe Coded a Forth Interpreter
barnacle wrote:
The fundamental point here, I think, is that it's not generating code at all.
Quote:
It doesn't in _any_ way 'know' anything. It can't; it's not intelligent. The only thing it knows is that certain symbols are often found close proximity to others.
At best, AI is pouring old wine into new bottles, all-the-while spilling some of it, getting different vintages mixed up, and making a mess. The significant “improvement” is the computers doing the re-bottling are far faster than the mightiest mainframes of ye olden days. That said, it’s still a dumb machine that is offering gibberish to Martin.
Quote:
Seen on the BBC site this morning: https://www.bbc.com/news/articles/cj0d6el50ppo
Declarations like "Proudly Human"...
Declarations like "Proudly Human"...
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Claude and I Vibe Coded a Forth Interpreter
barnacle wrote:
Most of the code on the internet appears to be 'why doesn't this work' (including my own!) rather than 'here is an empirically correct method to...', which rather suggests that you're going to get code built out of queries.
Re: Claude and I Vibe Coded a Forth Interpreter
I compiled and linked the interpreter, but I know it won't work. Moreover, the soundness of it is so poor that testing code in place is impossible. The solution to this problem is unit testing, so let's start alphabetically with compare.s! Likes all of Claude's code, the compare.s module is 300 lines of untested code. So, I created a tests subdirectory and wrote compareTest.s which is 490 lines! Running the test, I find two bugs I missed during code review. Both were caused by Claude doing an operation, dropping an item off the stack, and then testing the condition code!
After fixing these bugs I now have a working compare.s module. But Claude's contribution to that success is now much less than 50%!
The stack.s module (285 lines) seems like low hanging fruit because it's operations like dup, swap, drop, rot, etc. So, I just finished writing stackTest.s (360 lines) and plan to repeat this success.
Code: Select all
sbc 0,X ; a - b
inx ; drop b
inx
bvs @overflow ; Overflow-aware signed compare
The stack.s module (285 lines) seems like low hanging fruit because it's operations like dup, swap, drop, rot, etc. So, I just finished writing stackTest.s (360 lines) and plan to repeat this success.
Re: Claude and I Vibe Coded a Forth Interpreter
Claude, you disappointed me, but more importantly, I disappointed myself for not catching this.
This code has an issue which is minor or one serious depending upon your POV.
A magic number for the PSP_INIT value is an attractive nuisance if you ever change it. Unfortunately, Claude uses magic numbers, even after defining constants for them.
This code has an issue which is minor or one serious depending upon your POV.
Code: Select all
PUBLIC DEPTH_CODE
stx SCRATCH0 ; compute (PSP_INIT - x) / 2
lda #$03FF
sec
sbc SCRATCH0
lsr ; Divide by 2 (cells)
dex
dex
sta 0,X ; push to TOS
NEXT
ENDPUBLIC