6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 1:10 am

All times are UTC




Post new topic Reply to topic  [ 82 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Tue Aug 03, 2021 4:32 pm 
Offline

Joined: Sun Oct 07, 2018 6:04 pm
Posts: 30
I have just released C compilers that may interest this community.

Highlights:
  • ISO C 99 compiler for either 6502/65C02 and WDC 65816. This is a freestanding implementation with many features you will typically find in a hosted compiler.
  • Fully reentrant code model.
  • Support for all integer types up to 64 bits `long long`.
  • Floating point supported (32 bits IEEE-754).
  • Full support for struct, union, typedef and what you expect to find in C.
  • Support for (stack allocated) variable sized arrays.
  • Values of type `long long` are passed by reference rather than by value. The run-time keeps track of values and will create temporaries and copy data as needed for correct handling.
  • Optimizing compiler that can output source level debugging information.
  • Source code debugger included.
  • Support for ELF/DWARF, hex output as well as raw and `.pgz` (Foenix) style application files.

There are installers available for macOS, Ubuntu/Debian and Arch/Manjaro 64-bits x86/amd64 operating systems.

6502 links
User guide: tinyurl.com/y5wpxjb2
Arch installer tinyurl.com/nre46dym
Debian installer tinyurl.com/yp6c4cav
macOS installer tinyurl.com/az47vkdh

65816 links
User guide: tinyurl.com/sz566xb3
Arch installer: tinyurl.com/8c3m5s4
Debian installer: tinyurl.com/tba54cyu
macOS installer: tinyurl.com/46rjpk6v

These products are not open source, but can be used freely for non-commercial purposes (hobby, personal education, etc).

The 6502 compiler was previously released as NutStudio, but has been renamed from this release.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 03, 2021 5:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Sounds great - thanks for the announcement!


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 04, 2021 2:13 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Looks like a quality project!


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 30, 2021 5:57 pm 
Offline

Joined: Sun Oct 07, 2018 6:04 pm
Posts: 30
Here is a bug fix release version 3.3.2 of my C compiler for the 65816 and the 6502.

Links for 65816
User guide https://tinyurl.com/4ws689bh
Installer Arch Linux https://tinyurl.com/4krtbxxy
Installer Debian https://tinyurl.com/yysf79c5
Installer macOs https://tinyurl.com/88hu5r3c

Links for 6502
User guide https://tinyurl.com/f3vcwdy2
Installer Arch Linux https://tinyurl.com/yedpfp54
Installer Debian https://tinyurl.com/caw68y6t
Installer macOs https://tinyurl.com/469mjeaa


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 31, 2021 1:25 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
hth313 wrote:
Here is a bug fix release version 3.3.2 of my C compiler for the 65816 and the 6502.

Links for 65816
User guide https://tinyurl.com/4ws689bh
Installer Arch Linux https://tinyurl.com/4krtbxxy
Installer Debian https://tinyurl.com/yysf79c5
Installer macOs https://tinyurl.com/88hu5r3c

Links for 6502
User guide https://tinyurl.com/f3vcwdy2
Installer Arch Linux https://tinyurl.com/yedpfp54
Installer Debian https://tinyurl.com/caw68y6t
Installer macOs https://tinyurl.com/469mjeaa

Although I've done virtually nothing with C in a 6502-family application, I've been looking at this compiler with some interest, as there may well come a time where I'd do some C development for the 816. The compiler is clearly real labor of love and appears to have been logically designed. I also like that it appears to be usable on Linux without regards to the particular distro being used (I'm a SuSE advocate, FWIW). And unusual for more than a little software of this type, the user guide is well-organized.

Speaking of the user guide, I did see some information that was presented about the 65C816's memory model that I'd like to discuss in the interest of clarity. Also, I have some curiosity about how the compiler resolves some things that are more-or-less 65C816-specific. Please note that the following is not criticism, just an attempt to reconcile what the guide says with what I know about the 816's behavior.

On page 13 of the guide, there is:


    Even though the address space 16MB, it can be seen as a sequence of 64K address ranges.

That statement may be confusing to someone who is not well-acquainted with the 65C816 and its addressing model.

The sequence of 64KB address ranges is of concern with program execution, direct page and stack access, MPU vector access, and non-indexed indirect vector accesses. The 64KB boundary regarding program execution is due to PB (program bank) not incrementing when PC (program counter) wraps from $FFFF to $0000. That and the other items are artifacts of the 65C816 being designed to emulate a 65C02 at reset—the C02, of course, has no concept of extended addressing, as well as being designed to run 65C02 code while operating in native mode.

At the programmer's discretion, data access may be limited to a 16-bit range—in which case, the value in DB (data bank register) applies and conventional 6502 addressing modes are used, or the entire 24-bit address space may be treated as linear with the use of "bank-agnostic" code.


    While address calculations can be made to cross 64K page boundaries, it is not an efficient way to utilize the 65816.

The use of the word "page" in the above is incorrect and this incorrect usage is prevalent throughout the guide.

In 6502-family architecture, a "page" is defined as 256 contiguous bytes, not 64KB. In 65C816 architecture, a "bank" is defined as 256 contiguous pages, with any given bank starting at $xx0000, where $xx is the bank number, $00-$FF.¹

As for the statement about efficiency, I question that. The 65C816 assembly language makes it possible to treat the full address space as linear space for data purposes, using succinct code. This possibility is fully exploited if "long" pointers are handled as 32-bit entities instead of 24-bit so frequent use of REP and SEP in pointer arithmetic sequences can be avoided.²


    A single function needs to be shorted(sic) than 64K, data objects should also be kept smaller than 64K whenever possible.

I'm a little confused by what is being said here.

If "function" refers to program code, yes, it has to be limited in size to fit into one bank. In practice, that is almost never a limitation, as few 6502-family programs ever approach the 64KB size limit. Should an 816 program be so large that it exceeds 64KB of code, part of it will have to be run in a different bank. In such a case, I could see where the main-line routines would be in bank B and the associated subroutines in bank B+1 for convenience in loading from mass storage, with JSL instructions being used to call the subroutines in B+1. There is a slight performance penalty associated with this programming model, but it does allow programs to span hundreds of kilobytes if so desired.

The only data objects that would be tied to a bank would be those that are to be accessed using 16-bit addressing. My approach is to place static data, e.g., lookup tables, and dynamic data structures, e.g., buffers, specific to the program being run in the same bank as the code and at the beginning of the program, execute PHK - PLB to set the data bank to the program bank. Hence a 16-bit access is implicitly in the code's bank, giving fastest access to an object. That model is also usable in "far" subroutines by preserving PB prior to the PHK - PLB sequence.


    There are 256 such 64K pages on the 65816 numbered 00 to FF (hexadecimal). The first page (00) is special in that the system stack and the direct page must be located in it. Certain vectors, such as the reset and interrupt vectors also reside in page 00.

Again, "page" is being incorrectly used in place of "bank." What is being referred to as "page (00)" should be "bank (00)", although that is not the correct way to refer to any bank (no indirection is involved). Page $00 always refers to the lowest page in physical RAM bank $00. That is a very important distinction.

    The B or data bank register is an 8 bits register that points to the active near page.

The 65C816's has a B-accumulator (usually referred to as .B or BR in machine language monitors) when the accumulator/memory m flag bit in the status register (SR) is a 1, setting the accumulator size to eight bits. The data bank register is referred to as DB or DBR (I prefer the former) to avoid confusion with the name of the B-accumulator. Also, DB points to a "bank," not a "page." DP (direct page register) is the only 65C816 register that points to a "page."

On page 14, there is this:


    In the small data model the near area must be the same page(sic) as the stack (00). This is because the default pointer size is 16 bits wide and can only point to a single 64K page. As we may want to point to object in the stack, they must be in the same page.

Is the "stack" being referred to in the above the MPU's hardware stack or a LIFO in bank $00 being treated as a stack, e.g., like the data stack commonly used in Forth? If the latter, perhaps that should be clarified for the benefit of the reader.

If the former, I'm a little confused by the As we may want to point to object in the stack, they must be in the same page. statement. Wouldn't such an object be accessible with <offset>,S (stack-relative) addressing, which implicitly refers to bank $00 and does not involve DB in any way?

On page 15:


    A dynamically allocated variable is allocated from a heap using the malloc function...Note: Dynamically allocated variables are a potential problem in memory constrained systems if the program is left running for a long time due to heap fragmentation.

What defines a "memory-constrained" system? In other words, is there a total RAM threshold below which a system becomes memory-constrained from the compiler's perspective? Aside from the possibility of the heap becoming fragmented from numerous malloc() and free() calls (which is really a separate issue, I think), what other (possibly bad) things could happen in a memory-constrained environment? Also, how and when is the size of the heap determined? Is the heap limited to a single bank or can it be defined to span multiple banks to accommodate malloc() calls that request more than 64KB?

Also on page 15:


    The tiny address space is 256 bytes of memory located somewhere in the first 64K of memory. It has an address range 0x00-0xff...Being only 256 bytes and also shared with pseudo registers, you are somewhat limited on how much you can store in the tiny area.

I interpret the above to mean a data structure could be placed on direct page, which is unusual in most code—direct page is generally considered to be too valuable to be used in that fashion, except in narrowly-defined cases. Please clarify. Also, if I write REGISTER INT X=0 in my C program, where is X being stored?

On page 16:


    The huge address space covers the entire 16MB address range. The maximum size of an object is 16MB (minus one byte).

I've not seen any 65C816 system with more than 4MB of RAM. How does the compiler reconcile the theoretical 16MB address space to the actual address space? For example, if I were to compile a program using "huge" addressing to run on my POC V1.3 unit (128KB, of which 64KB are in bank $01), how would the program "know" that the highest-accessible physical address is $01FFFF? Also, does the compiler have a way to handle memory "holes" caused by idiosyncratic address decoding?

On page 74:


    Assembly language files usually have the file extension .s or .asm, but it varies wildly as assembly language itself is not standardized in any way.

Although source code file naming conventions do vary, the 6502 assembly language itself is standardized—MOS Technology wrote the original standard, which compliant assemblers will implement. The 65C816 assembly language is also standardized—WDC promulgates that standard in the data sheet. The language standards intentionally ignore assembler pseudo-ops, which is nothing unique to the 6502 universe.

Regrettably, there are all sorts of 6502 assembly language bastardizations in hobbyist-developed assemblers, some borne of ignorance of the MOS Technology/WDC standards that have existed some 46 years, and others the result of the "I don't like it, so I'm going to come up with my own 'standard'." line of thinking. So it probably should be clarified that there is a standard, even though it may not seem so to the casual observer.

On page 80:


    rodata - An initialized data section in read only memory (ROM).

Can or does the compiler treat the run-time data that was generated during compilation, e.g., a lookup table associated with, say, a switch() statement, as read-only data, as in:

Code:
switch (i) {
  case 1:  printf("i = 1.\n");
           break;
  case 2:  printf("i = 2.\n");
           break;
  ...etc...
  default: printf("i is not 1 or 2.\n");
}

At least in assembly language, there would have to be some sort of address lookup table to vector the MPU according to the result of the switch() statement. Would your compiler place that lookup table in the rodata segment, even though no ROM may be involved?

——————————
¹Technically speaking, a 65C816 "bank" is really a "segment," as it isn't accessed through an address space "window," as banking would be done in an eight-bit system to give access to more than 64KB.

²I recommend to anyone writing firmware and/or a operating system kernel for a 65C816 adopt the use of 32-bit pointers for parameter-passing. The underlying code will be smaller and faster, at the expense of slightly more direct page usage.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Tue Aug 31, 2021 7:40 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 31, 2021 8:15 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Perhaps one might say that there is a de jure standard, widely ignored, and there is no de facto standard, for assembly language syntax and indeed filenaming. The pragmatic developer will recognise this situation and act accordingly.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 31, 2021 7:38 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
BigEd wrote:
Perhaps one might say that there is a de jure standard, widely ignored, and there is no de facto standard, for assembly language syntax and indeed filenaming. The pragmatic developer will recognise this situation and act accordingly.

There at one time was an IEEE draft standard for assembly language, which MOS Technology largely implemented in the 6502 assembly language. That is why we see the exclusive use of three-character mnemonics, unlike the ancestral 6800 language.

Due to the disparate work of countless 6502 enthusiasts, I don't think a true standard will ever exist, beyond use of the mnemonics published by MOS Technology and WDC. At this point in time, WDC's syntax in the applicable data sheets is the reference that should drive the development of any assembler.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 31, 2021 8:50 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
BigDumbDinosaur wrote:
hth313 wrote:
    The huge address space covers the entire 16MB address range. The maximum size of an object is 16MB (minus one byte).
I've not seen any 65C816 system with more than 4MB of RAM. How does the compiler reconcile the theoretical 16MB address space to the actual address space? For example, if I were to compile a program using "huge" addressing to run on my POC V1.3 unit (128KB, of which 64KB are in bank $01), how would the program "know" that the highest-accessible physical address is $01FFFF? Also, does the compiler have a way to handle memory "holes" caused by idiosyncratic address decoding?

The thought has occurred to me that the compiler has to be aware of the environment on which the executable will be run, since no program is truly insular, even if no I/O is involved. As calls to malloc() tap into the heap, which I understand to be a finite size determined during compilation, it would seemingly set a hard limit on how much memory could be offered by malloc(), unless...

In the UNIX/Linux environment, sbrk() would be called by malloc() when the heap was (nearly) exhausted and more memory was needed to satisfy future requests (I'm cheerfully assuming that no free() calls are made in the running program, which means it will likely increase its memory consumption the longer it runs). sbrk(), in turn, would try to "find" memory, either physical or virtual, with the latter possible in a machine whose MPU supports the concept of virtual memory.¹

Given that, the Calypsi compiler's rendition of malloc() would have to have some way to know how to enlarge the heap when nearly exhausted. Doing so would either require the malloc() code have knowledge of the machine's physical memory map, or malloc() would have to know how to request more memory from the operating environment on which the executable is running.

Please clarify.

——————————
¹It is theoretically possible to build a virtual-memory system using the 65C816. The ABORTB interrupt input provides the basic means by which program execution may be temporarily stopped to allow the kernel to deal with a page fault. ABORTB' behavior is sub-optimal, but that is not an insurmountable obstacle in a system that uses a CPLD or FPGA to implement glue logic.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 31, 2021 11:41 pm 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
BigDumbDinosaur wrote:
That is why we see the exclusive use of three-character mnemonics, unlike the ancestral 6800 language.

No. The 6800 assembly language as defined by Motorola uses exclusively three-character mnemonics. (See attached chart from the Motorola M6800 Programming Reference Manual.) Perhaps you have chosen to use some non-standard version of 6800 assembly.

(For what it's worth, I think that there's absolutely nothing wrong with being non-standard in any way in an assembly language you're defining, so long as you have a good reason for it. To make your language worse and more difficult to program in merely to follow a "standard" is blind stupidity. Do not take this to mean that I think every change out there is justified.)


Attachments:
6500-prg-instab-4-2.png
6500-prg-instab-4-2.png [ 62.92 KiB | Viewed 4110 times ]

_________________
Curt J. Sampson - github.com/0cjs
Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 01, 2021 4:17 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
cjs wrote:
BigDumbDinosaur wrote:
That is why we see the exclusive use of three-character mnemonics, unlike the ancestral 6800 language.

No. The 6800 assembly language as defined by Motorola uses exclusively three-character mnemonics.

The 6800 assembly language does not exclusively use three-character mnemonics because accumulator-based instructions have a fourth mnemonic character identifying which accumulator is the target. For example, LDA #<operand> implicitly loads the 6502's (only) accumulator. The equivalent operation in 6800 assembly language would LDAA #<operand> or LDA A #<operand> in some assemblers (when I learned 6800 assembly language it was the former style). Loading the 6800's B-accumulator would be LDAB #<operand> or LDA B #<operand>. In both examples, 'A' and 'B' are considered to be part of the complete mnemonic, since they specify which accumulator is being loaded. This same pattern prevails for all of the 6800's accumulator-based instructions, e.g., STAA (store A-accumulator) or STAB (store B-accumulator).

See page 3-22 of Lance Leventhal's 6800 Assembly Language Programming for a reference.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 01, 2021 6:48 am 
Offline
User avatar

Joined: Sat Dec 01, 2018 1:53 pm
Posts: 730
Location: Tokyo, Japan
BigDumbDinosaur wrote:
The 6800 assembly language does not exclusively use three-character mnemonics because accumulator-based instructions have a fourth mnemonic character identifying which accumulator is the target.

No. That character is not part of the mnemonic, it is part of the operand. E.g., `LDA B,#13`. Note where the space separating the mnemonic and the operand field is.

Quote:
In both examples, 'A' and 'B' are considered to be part of the complete mnemonic...

By you, sure. But not by Motorola, which you'll note refers to them as "(Dual Operand)" in the excerpt I posted.

If you're going to go to battle for letters in the operand field being part of the mnemonic when they change the opcode or destination of other parts of the operand, keep in mind that in 6502 assembler, the `,X` in `LDA $(13,X)` does the same.

Quote:
...in 6800 assembly language would LDAA #<operand>...

In whatever non-standard assembler you were using, sure. You won't see anything like that in Motorola's M6800 Programming Reference Manual.

Quote:
See page 3-22 of Lance Leventhal's 6800 Assembly Language Programming for a reference.

Sure he differs from Motorola here. But I'm not buying that Motorola themselves created and published "non-standard mnemonics at the start" and it was only that later that third parties developed the "standard" mnemonics for the CPU. If you go down that route, who's to say that the mnemonics you so like for the 6502 are "standard"?

When you posted your reply here you already had a reference from me for Motorola's original programming manual for the 6800, which is freely available on archive.org. I suggest you go have a read. And maybe think about exactly why you're not explaining why that source is inconsistent with your third-party source and how that affects your argument, since you knew (or should have known) about that inconsistency.

_________________
Curt J. Sampson - github.com/0cjs


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 01, 2021 7:07 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Perhaps a little O.T.: I have no 6800 experience, but all the 6800 code I've seen put the A or B on the end of the mnemonic, with no space, making it four letters, as in BillG's post at viewtopic.php?p=78469#p78469 and all the subsequent posts in that topic ("6502 vs 6800")

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 01, 2021 7:51 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
I thought I felt my ears burning...

I have been programming the 6800 since 1979. Though early Motorola documents showed the "split" instruction format, every assembler I have used allowed the alternative "merged" form as well.

Code listings in literature were about half and half. I never liked the split format. In those days of slow and small storage devices, that added space was wasteful.

It is insightful that the split format was deprecated in documentation for the 6809 and more notably, the 6801/6803 which sported some extensions of the original instruction set but was otherwise source and binary compatible.

https://www.lucidtechnologies.info/6803_instr.pdf


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 01, 2021 7:53 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
GARTHWILSON wrote:
Perhaps a little O.T.: I have no 6800 experience, but all the 6800 code I've seen put the A or B on the end of the mnemonic, with no space, making it four letters, as in BillG's post at viewtopic.php?p=78469#p78469 and all the subsequent posts in that topic ("6502 vs 6800")

Lance Leventhal shows both forms in his book, e.g., LDAA and LDA A. When I learned MC6800 assembly language in 1975, the crossassembler I was using (running on an IBM S360 mainframe) only recognized the xxAA or xxAB formats. The documentation for the assembler stated that the fourth character that identified which accumulator was being acted on was part of the mnemonic, not the operand, and that it could not be placed in the operand field. Also, see the below, which uses the LDA A style, but clearly does not place the accumulator symbol anywhere near the operand field.

Attachment:
File comment: 6800 Assembly Language Excerpt.
mc6800_asm_example.png
mc6800_asm_example.png [ 140.93 KiB | Viewed 4080 times ]

BTW, although I did learn MC6800 assembly language well enough to write some short programs, I never approached the degree of fluency I achieved with the 6502, 65C816 and to a lesser extent, the MC68000.

Regarding Lance Leventhal's 6800 programming tome, he was at the time considered one of the foremost authorities on microprocessors and was fluent in multiple assembly languages. He wrote a number of XXXX Assembly Language Programming books that were considered authoritative. I'm sure most here who have been programming the 6502 for many years has a copy of his 6502 Assembly Language Programming. Mine is pretty dog-eared. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Sep 01, 2021 7:58 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8504
Location: Midwestern USA
BillG wrote:
I thought I felt my ears burning...

Mine usually ring, but when I answer them nobody's on the line. :D

Quote:
It is insightful that the split format was deprecated in documentation for the 6809 and more notably, the 6801/6803 which sported some extensions of the original instruction set but was otherwise source and binary compatible.

https://www.lucidtechnologies.info/6803_instr.pdf

One can clearly see in the instruction table on page 2 accumulator instructions that are four-character mnemonics.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 82 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: