Storing strings in memory
Storing strings in memory
I can see two ways to store an ASCII string in memory.
1. The string to be displayed-one termination byte-
eg:- 5465737400
2. A byte at the beginning of the string location describing the length of the string
eg:- 0474736554
In the first instance you need to test each byte for the end of string terminator '00', which is not the case in the second scenario, where it might be considered a disadvantage the string need be reversed in memory. For long strings or heavy string processing, I would expect the second to be quicker - although the downside may be the string is limited in length.
Which method is recommended.
Secondly on my project, the VIA seems to run quite warm, not hot, I can lay my finger on it without too much discomfort. Does this sound OK?
1. The string to be displayed-one termination byte-
eg:- 5465737400
2. A byte at the beginning of the string location describing the length of the string
eg:- 0474736554
In the first instance you need to test each byte for the end of string terminator '00', which is not the case in the second scenario, where it might be considered a disadvantage the string need be reversed in memory. For long strings or heavy string processing, I would expect the second to be quicker - although the downside may be the string is limited in length.
Which method is recommended.
Secondly on my project, the VIA seems to run quite warm, not hot, I can lay my finger on it without too much discomfort. Does this sound OK?
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Storing strings in memory
I've used both ways, and can go either way, although I might slightly favor counted strings like Forth uses, rather than null-terminated. "Counted" means the first byte tells the number of data bytes in the string. If you had any reason to have a null byte in the string before the end, you can do it. Also, if you want to concatenate strings, or truncate a string, you don't have to go looking for the end, so it can be more efficient in some cases.
If the VIA is CMOS, ie, 65c22, there should be no discernible heating. I don't remember if the NMOS one produce discernible heating, but NMOS has other disadvantages anyway besides just taking more power, so I'd recommend using CMOS if possible.
If the VIA is CMOS, ie, 65c22, there should be no discernible heating. I don't remember if the NMOS one produce discernible heating, but NMOS has other disadvantages anyway besides just taking more power, so I'd recommend using CMOS if possible.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: Storing strings in memory
A third technique would take advantage of the 7-bit nature of plain-vanilla ASCII, and use the high-bit as an end-of-string marker:
PQRS --> 50 51 52 D3
The problem I'm having with my experimental 32-bit hobby processor design is that it can't address individual 8-bit bytes, only 32-bit words. UTF-32 is a perfect fit for it, and is a nice specification, but it isn't super popular yet, and tends to carry a lot of bit-baggage for many simple usage cases. Eight-bit bytes have been around for a long time, and they are an integral part of some popular encodings, so my processor is going to have to work a bit harder than most to accommodate them. It's either that, or clutter up its native instruction set with a bunch of byte manipulators, and I am still trying very hard to avoid that.
An example is a translation of a DTC Forth based on Dr. Brad's work. Lots and lots of Forth words can be implemented in a single one-word 65m32 machine instruction (plus a one-word NEXT) ... branch + 1+ 1- 2* 2/ >R @ AND DROP DUP EXIT INVERT NEGATE OR R> SWAP UNLOOP XOR [ ] NIP RDROP 2RDROP ... I'm sure that I'm forgetting a few others. The Forth dictionary overhead is much larger than the actual code doing the work, especially if I don't pack the names. Packing and unpacking names is inefficient, so I'm thinking that my initial implementation will just use UTF-32 and "waste" about 75% of the available bits, in the interest of simplicity and expediency, even though it rubs the size-optimizing side of my personality the wrong way.
YMMV
Mike B.
PQRS --> 50 51 52 D3
The problem I'm having with my experimental 32-bit hobby processor design is that it can't address individual 8-bit bytes, only 32-bit words. UTF-32 is a perfect fit for it, and is a nice specification, but it isn't super popular yet, and tends to carry a lot of bit-baggage for many simple usage cases. Eight-bit bytes have been around for a long time, and they are an integral part of some popular encodings, so my processor is going to have to work a bit harder than most to accommodate them. It's either that, or clutter up its native instruction set with a bunch of byte manipulators, and I am still trying very hard to avoid that.
An example is a translation of a DTC Forth based on Dr. Brad's work. Lots and lots of Forth words can be implemented in a single one-word 65m32 machine instruction (plus a one-word NEXT) ... branch + 1+ 1- 2* 2/ >R @ AND DROP DUP EXIT INVERT NEGATE OR R> SWAP UNLOOP XOR [ ] NIP RDROP 2RDROP ... I'm sure that I'm forgetting a few others. The Forth dictionary overhead is much larger than the actual code doing the work, especially if I don't pack the names. Packing and unpacking names is inefficient, so I'm thinking that my initial implementation will just use UTF-32 and "waste" about 75% of the available bits, in the interest of simplicity and expediency, even though it rubs the size-optimizing side of my personality the wrong way.
YMMV
Mike B.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Storing strings in memory
APL wrote:
1. The string to be displayed-one termination byte-
eg:- 5465737400
2. A byte at the beginning of the string location describing the length of the string
eg:- 0474736554
eg:- 5465737400
2. A byte at the beginning of the string location describing the length of the string
eg:- 0474736554
If the string is to be prepended with a length then you have to use that length as a down-counter while processing the string, which usually means using both .X and .Y, the former to act as the counter and the latter to act as the index. With the zero terminator, only one index register need be used, as the index.
The one distinct advantage of prepending a string with its length is that the null byte has no special significance and hence may be embedded in a string without consequence. On the other hand, the programmer has to make a decision as to whether to use an eight bit or 16 bit length. BASIC implementations on the 65xx family generally use an eight bit length. Business BASIC implementations, such as BBX and Thoroughbred, use a 16 bit length and thus can handle much longer strings. Pick yer poison!
x86? We ain't got no x86. We don't NEED no stinking x86!
- commodorejohn
- Posts: 299
- Joined: 21 Jan 2016
- Location: Placerville, CA
- Contact:
Re: Storing strings in memory
BigDumbDinosaur wrote:
The one distinct advantage of prepending a string with its length is that the null byte has no special significance and hence may be embedded in a string without consequence.
That said, I still prefer null-terminated for most purposes. It's always nice to get a critical operation for free, which any CPU which sets the flags on a register load will give you with null-terminated strings.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Storing strings in memory
commodorejohn wrote:
BigDumbDinosaur wrote:
The one distinct advantage of prepending a string with its length is that the null byte has no special significance and hence may be embedded in a string without consequence.
My 65C816 string processing library works with null-terminated strings. I internally set the maximum string length (minus terminator) to 32,767 and abort processing if the (source) string length exceeds that—it's an easy check on the '816. There is also a check to determine if the catenation of two strings will exceed the 32KB length limit. It's not perfect, but it provides some protection.
Quote:
That said, I still prefer null-terminated for most purposes. It's always nice to get a critical operation for free, which any CPU which sets the flags on a register load will give you with null-terminated strings.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Storing strings in memory
BigDumbDinosaur wrote:
If the string is to be prepended with a length then you have to use that length as a down-counter while processing the string, which usually means using both .X and .Y, the former to act as the counter and the latter to act as the index. With the zero terminator, only one index register need be used, as the index.
For example, given a string pointed to by ($b0, $b1), the following will emit the string to a UART (via a 'putchar' function):
Code: Select all
puts: ldy #0
lda ($b0),y
beq exit
tay
loop: lda ($b0),y
jsr putchar
dey
bne loop
exit: rts
where ($b0, $b1) -> .byte 24, "!gnirts a si siht ,olleH"Use of this technique will fill your swear jar quickly.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Storing strings in memory
Quote:
A lot of language behavior has been based on how the hardware on which the language was originally developed behaved.
In my last big project, I did use null-terminated strings though, because I didn't have to do such string gymnastics. The only string editing was such that the string lengths were unchanged, and beyond that, all I had to do was display them.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Storing strings in memory
On a minicomputer I worked with, the Fortran compiler used a character type (strings) which was a two-word descriptor. One word for the address of the string, the other word for the length.
Warm chip (Re: Storing strings in memory)
APL wrote:
Secondly on my project, the VIA seems to run quite warm, not hot, I can lay my finger on it without too much discomfort. Does this sound OK?
GARTHWILSON wrote:
.......
If the VIA is CMOS, ie, 65c22, there should be no discernible heating. I don't remember if the NMOS one produce discernible heating, but NMOS has other disadvantages anyway besides just taking more power, so I'd recommend using CMOS if possible.
I experienced the same as APL : when I used CMOS Rockwell chips in my machine, the VIA (tried 3 different ones) was warm, the infrared thermometer gave approxmimately 45 °C / 113°F. There was no correlation to what the chip was doing (uninitialized - I/Os working or not - timers running or not) Both other chips (CPU + ACIA) stayed at room temperature. More logically, the NMOS ones all ran warm..
Needless to say, I checked and re-checked my board and connexions : no trouble. Then I switched to WDC chips : they're all at room temperature (running 24/24).
Marc
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: Storing strings in memory
Tor wrote:
On a minicomputer I worked with, the Fortran compiler used a character type (strings) which was a two-word descriptor. One word for the address of the string, the other word for the length.
Each variable has a 5-byte slot to store its value, and if the variable was a string, only 3 are used for this length+pointer structure.
Re: Storing strings in memory
BigDumbDinosaur wrote:
A lot of language behavior has been based on how the hardware on which the language was originally developed behaved. In the case of C, which came to life on the DEC PDP-11, Ritchie was most likely taking advantage of the fact that the MOV instruction would set the Z flag in the condition code register if a null byte or word was copied into the target register, just as TXA would do the same thing in the 6502 if .X was loaded with $00.
It does seem that BCPL either used counted strings, or both conventions, depending on which source you consult. Here's Dennis M Ritchie, in Development of the C Language:
Quote:
None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled *e. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.
http://www.tuhs.org/Archive/PDP-11/Dist ... onZero.txt
(*) By which I mean assembly language, of course! I'm just teasing.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Storing strings in memory
BigEd wrote:
An interesting document came to light recently(+), from 1971, which is described as version zero of the documentation for Unix. It describes Unix as running on a PDP-11, with reference to the previous and first version which ran on PDP-7 and -9 (both 18 bit machines.) The version described was, I think, written in assembler(*), and includes a B compiler. (B came after BCPL and before C.) The OS call interface already makes use of NUL-terminated strings. I would guess that the same would be true of the previous version, but I suppose it's possible that that isn't so. I don't know whether or not the -7 and -9 come with a handy zero flag.
Incidentally, B was written by Ken Thompson in 1970 to run on UNIX on PDP-7 hardware. Thompson took everything out of BCPL that he thought could be eliminated without losing too much functionality, doing this mainly to accommodate the PDP-7s tiny address space. B's performance was lackluster and being typeless (as was BCPL), handling complex data structures was a painful exercise. The development of C addressed these concerns.
x86? We ain't got no x86. We don't NEED no stinking x86!
- commodorejohn
- Posts: 299
- Joined: 21 Jan 2016
- Location: Placerville, CA
- Contact:
Re: Storing strings in memory
BigDumbDinosaur wrote:
I seem to vaguely recall that the PDP-11 assembly language was an enhanced version of the PDP-7 assembly language, which means the use of the null terminator would have been a "natural" idiom on the PDP-7 as well. I knew just enough PDP-11 assembly language to be dangerous. 