标准学习完毕。呵呵,标准里关于这方面的描述没我想像的复杂,和我以前学习其它的概念相比容易了不少。
字符串常量的语义是“字符串字面(string literal)"。 就是说这是个基本类型了,与 整型 之类的享有同等的地位。不过看标准里关于这个类型的描述量,并没看出同等。
行为是:静态储存域(static storage duration);在编译时分配空间,其大小应该正好等于字面值加上一个\0的位置;类型视为 字符型数组(注意不是 const char 型的)。虽然不是常字符类型,但对其更改的行为是未定义的。(我觉得在标准里没有特别描述 string literal 语义特征的时候,把它理解成非具名数组应该是最接近的。不然标准里有些规定就要自相矛盾了。)
在预处理阶段,要把毗邻的字面字串连接在一起。所有在预处理阶段应该处理的标记(token)规则,不对其生效。
标准描述了很多关于如何判断一个字面是 char 类型,还是 w_char 类型的规则。和你的问题关系不是很大,我就不介绍了。
可以用于对字符型数组的初始化,享有可选 {} 的权利(就是说可以写 char a[] = "abc", 或 char a[] = { "abc" } )。如果指定数组的大小足以容纳拖尾的 \0,或者未指定大小,则 \0 也被视为初值之一。
6.4.5 中的 第6条,提到了楼主敏感的那个问题,它的上下文是第5条(我下面都给出了)。它的内容是“是否将两个字面视为同一是未明确的,除非它们的元素有适宜的值。"(这是我翻译的,没什么权威性,请参见以下原文)。
这个适宜是个很模糊的说法。我的解释是:标准认为可以将两个一样的字符串视为同一个,也可以不。既 TC 的实现方法是可以接受的。
因为在编译器实现的时候必须考虑多字节编码,而不只是 ascii 编。比如很常见到大家在写注释,甚至字面的时候用中文。很多比较函数受本地环境影响,有很大的出入。比如“简历”的 英文 resume 和 法文 résume 在一定的系统环境变量控制下,会在 strcmp(我是这意思,当然你不能用这个函数比较非 char 类型) 里返回 true。所以标准在这为编译器的实现者提供了人性化,就是吃不准就分开,任何做法都符合标准。
“未明确(unspecified)”是指,标准对该条款项给出了不止一种选择,但不做强迫选哪个的规定。实现必须选择其中一种,并在文档中申明。其它条目中出现的“未定义(undifined)”是指,标准对这种行为不施加任何限制,实现可以任意方式实现(甚至选择不实现)。
基本上就这些。我还从标准里选了点相关的条目。大家可以自己看看。
以下引用出自:ISO/IEC 9899:1999 (E)
6.4.5 String literals
Description
2 A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz". ...
Semantics
5 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; ...
6 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
6.5.9 Equality operators
...
Constraints
2 One of the following shall hold:
— both operands have arithmetic type;
— both operands are pointers to qualified or unqualified versions of compatible types;
— one operand is a pointer to an object or incomplete type and the other is a pointer to a qualified or unqualified version of void; or
— one operand is a pointer and the other is a null pointer constant.
6.5.16.1 Simple assignment
Constraints
1 One of the following shall hold:
— the left operand has qualified or unqualified arithmetic type and the right has arithmetic type;
— the left operand has a qualified or unqualified version of a structure or union type compatible with the type of the right;
— both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
— one operand is a pointer to an object or incomplete type and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left has all the qualifiers of the type pointed to by the right; or
— the left operand is a pointer and the right is a null pointer constant.
— the left operand has type _Bool and the right is a pointer.
6.7.8 Initialization
...
Semantics
14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
32 EXAMPLE 8 ..., the declaration
char *p = "abc";
defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
[
本帖最后由 pangding 于 2011-3-3 21:45 编辑 ]