学C之初的一个问题关于字符串的请教各位高手·。。 - C语言论坛

第 21 楼

得分:0

以下是引用pcbaichi在2011-3-3 00:09:15的发言：

楼主这样说是有道理，但lz肯定是没有那种深刻体会，否则我也不会来发表意见，孔明大牛说的真的没错

这个看怎么理解了

看的垃圾多了肯定是嫉恶如仇看到不好的就大骂一番

如果是自己亲身写过知道其中的难处我想可能会多一点理解

第 22 楼

得分:13

在Java中，为节省空间计，对于内容相同的字符串常量，在内存中仅开辟一段空间。
孔明先生所说的是能够解释部分编译器下对此问题的处理方式，但又不能完全解决所有编译器对此问题的处理“标准”。然而这是一个事实。
所以，鄙人的看法是，对于这个问题，不同编译器，“标准”不相同；那么追问这些标准，和规避因标准不同而可能会产生的潜在错误，哪个更务实，应该不言而喻。当然，对于楼主追问的精神，我很是佩服。
类似的问题，在C中其实还有，比如说关于++、--，不同编译器的处理方式迥然不同，而编程者对待这样的问题可能更应该的是：小心规避。
这里有个“可移植性”的问题。

当一名对得起学生学费的老师，一直是我的目标！我会更努力的！

第 23 楼

得分:0

我的回复编辑原帖写上面了再次感谢各位大大。。。

第 24 楼

得分:0

与编译器有关，声明的字符串常量与用static声明的局部变量和全局变量是一样的放在相同的内存段。位置是编译时确定的，运行时不变。
在gcc中相同应该是节省内存。孔明大神应该解释的很清楚了。

第 25 楼

得分:0

标准学习完毕。呵呵，标准里关于这方面的描述没我想像的复杂，和我以前学习其它的概念相比容易了不少。

字符串常量的语义是“字符串字面(string literal)"。就是说这是个基本类型了，与整型之类的享有同等的地位。不过看标准里关于这个类型的描述量，并没看出同等。
行为是：静态储存域(static storage duration)；在编译时分配空间，其大小应该正好等于字面值加上一个\0的位置；类型视为字符型数组(注意不是 const char 型的)。虽然不是常字符类型，但对其更改的行为是未定义的。(我觉得在标准里没有特别描述 string literal 语义特征的时候，把它理解成非具名数组应该是最接近的。不然标准里有些规定就要自相矛盾了。)
在预处理阶段，要把毗邻的字面字串连接在一起。所有在预处理阶段应该处理的标记(token)规则，不对其生效。
标准描述了很多关于如何判断一个字面是 char 类型，还是 w_char 类型的规则。和你的问题关系不是很大，我就不介绍了。
可以用于对字符型数组的初始化，享有可选 {} 的权利(就是说可以写 char a[] = "abc", 或 char a[] = { "abc" } )。如果指定数组的大小足以容纳拖尾的 \0，或者未指定大小，则 \0 也被视为初值之一。

6.4.5 中的第6条，提到了楼主敏感的那个问题，它的上下文是第5条(我下面都给出了)。它的内容是“是否将两个字面视为同一是未明确的，除非它们的元素有适宜的值。"(这是我翻译的，没什么权威性，请参见以下原文)。
这个适宜是个很模糊的说法。我的解释是：标准认为可以将两个一样的字符串视为同一个，也可以不。既 TC 的实现方法是可以接受的。
因为在编译器实现的时候必须考虑多字节编码，而不只是 ascii 编。比如很常见到大家在写注释，甚至字面的时候用中文。很多比较函数受本地环境影响，有很大的出入。比如“简历”的英文 resume 和法文 résume 在一定的系统环境变量控制下，会在 strcmp(我是这意思，当然你不能用这个函数比较非 char 类型) 里返回 true。所以标准在这为编译器的实现者提供了人性化，就是吃不准就分开，任何做法都符合标准。

“未明确(unspecified)”是指，标准对该条款项给出了不止一种选择，但不做强迫选哪个的规定。实现必须选择其中一种，并在文档中申明。其它条目中出现的“未定义(undifined)”是指，标准对这种行为不施加任何限制，实现可以任意方式实现(甚至选择不实现)。

基本上就这些。我还从标准里选了点相关的条目。大家可以自己看看。
以下引用出自：ISO/IEC 9899:1999 (E)

6.4.5 String literals
Description

2 A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz". ...

Semantics

5 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; ...

6 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

6.5.9 Equality operators
...
Constraints

2 One of the following shall hold:
— both operands have arithmetic type;
— both operands are pointers to qualified or unqualified versions of compatible types;
— one operand is a pointer to an object or incomplete type and the other is a pointer to a qualified or unqualified version of void; or
— one operand is a pointer and the other is a null pointer constant.

6.5.16.1 Simple assignment
Constraints
1 One of the following shall hold:
— the left operand has qualified or unqualified arithmetic type and the right has arithmetic type;
— the left operand has a qualified or unqualified version of a structure or union type compatible with the type of the right;
— both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
— one operand is a pointer to an object or incomplete type and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left has all the qualifiers of the type pointed to by the right; or
— the left operand is a pointer and the right is a null pointer constant.
— the left operand has type _Bool and the right is a pointer.

6.7.8 Initialization
...
Semantics

14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

32 EXAMPLE 8 ..., the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

[ 本帖最后由 pangding 于 2011-3-3 21:45 编辑 ]

第 26 楼

得分:0

以下是引用pangding在2011-3-3 18:30:25的发言：

标准学习完毕。呵呵，标准里关于这方面的描述没我想像的复杂，和我以前学习其它的概念相比容易了不少。

字符串常量的语义是“字符串字面(string literal)"。就是说这是个基本类型了，与整型之类的享有同等的地位。不过看标准里关于这个类型的描述量，并没看出同等。
行为是：静态储存域(static storage duration)；在编译时分配空间，其大小应该正好等于字面值加上一个\0的位置；类型视为字符型数组(注意不是 const char 型的)。虽然不是常字符类型，但对其更改的行为是未定义的。(我觉得在标准里没有特别描述 string literal 语义特征的时候，把它理解成非具名数组应该是最接近的。不然标准里有些规定就要自相矛盾了。)
在预处理阶段，要把毗邻的字面字串连接在一起。所有在预处理阶段应该处理的标记(token)规则，不对其生效。
标准描述了很多关于如何判断一个字面是 char 类型，还是 w_char 类型的规则。和你的问题关系不是很大，我就不介绍了。
可以用于对字符型数组的初始化，享有可选 {} 的权利(就是说可以写 char a[] = "abc", 或 char a[] = { "abc" } )。如果指定数组的大小足以容纳拖尾的 \0，或者未指定大小，则 \0 也被视为初值之一。

6.4.5 中的第6条，提到了楼主敏感的那个问题，它的上下文是第5条(我下面都给出了)。它的内容是“是否将两个字面视为同一是未明确的，除非它们的元素有适宜的值。"(这是我翻译的，没什么权威性，请参见以下原文)。
这个适宜是个很模糊的说法。我的解释是：标准认为可以将两个一样的字符串视为同一个，也可以不。既 TC 的实现方法是可以接受的。
因为在编译器实现的时候必须考虑多字节编码，而不只是 ascii 编。比如很常见到大家在写注释，甚至字面的时候用中文。很多比较函数受本地环境影响，有很大的出入。比如“简历”的英文 resume 和法文 résume 在一定的系统环境变量控制下，会在 strcmp(我是这意思，当然你不能用这个函数比较非 char 类型) 里返回 true。所以标准在这为编译器的实现者提供了人性化，就是吃不准就分开，任何做法都符合标准。

“未明确(unspecified)”是指，标准对这个語法项给出了不止一种选择，但不做强迫选哪个的规定。实现必须选择其中一种，并在文档中申明。其它条目中出现的“未定义(undifined)”是指，标准对这种行为不施加任何限制，实现可以任意方式实现(甚至选择不实现)。

基本上就这些。我还从标准里选了点相关的条目。大家可以自己看看。
以下引用出自：ISO/IEC 9899:1999 (E)

再次感谢。。

学习中