For program objects that span multiple bytes, we must establish two
conventions: what will be the address of the object, and how will we
order the bytes in memory. In virtually all machines, a multibyte object
is stored as a contiguous sequence of bytes, with the address of the
object given by the smallest address of the bytes used. For example,
suppose a variable x of type int has address 0x100, that is, the value
of the address expression &x is 0x100. Then the four bytes of x would
be stored in memory locations 0x100, 0x101, 0x102, and 0x103.
For ordering the bytes representing an object, there are two common
conventions. Consider a w-bit integer having a bit representation
[xw-1,xw-2,......,x1,x0], where xw-1 is the most significant bit, and x0 is
the least. Assuming w is a multiple of eight, these bits can be grouped
as bytes, with the most significant byte having bits [xw-1,xw-2,......,xw-8],
the least significant byte having bits [x7, x6,......,x0], and the other bytes
having bits from the middle. Some machines choose to store the object
in memory ordered from least significant byte to most, while other machines
store them from most to least. The former convention—where the least
significant byte comes first—is referred to as little endian. This convention
is followed by most machines from the former Digital Equipment Corporation
(now part of Compaq Corporation), as well as by Intel. The latter convention
—where the most significant byte comes first—is referred to as big endian.
This convention is followed by most machines from IBM, Motorola, and
Sun Microsystems. Note that we said “most.” The conventions do not
split precisely along corporate boundaries. For example, personal computers
manufactured by IBM use Intel-compatible processors and hence are little
endian. Many microprocessor chips, including Alpha and the PowerPC by
Motorola can be run in either mode, with the byte ordering convention
determined when the chip is powered up.
Continuing our earlier example, suppose the variable x of type int and
at address 0x100 has a hexadecimal value of 0x01234567. The ordering
of the bytes within the address range 0x100 through 0x103 depends on
the type of machine:
Big endian
0x100 0x101 0x102 0x103
01 23 45 67
Little endian
0x100 0x101 0x102 0x103
67 45 23 01
Note that in the word 0x01234567 the high-order byte has hexadecimal
value 0x01, while the low-order byte has value 0x67.
People get surprisingly emotional about which byte ordering is the proper
one. In fact, the terms “little endian” and “big endian” come from the
book Gulliver’s Travels by Jonathan Swift, where two warring factions could
not agree by which end a soft-boiled egg should be opened—the little end
or the big. Just like the egg issue, there is no technological reason to choose
one byte ordering convention over the other, and hence the arguments
degenerate into bickering about sociopolitical issues. As long as one of
the conventions is selected and adhered to consistently, the choice is arbitrary.
For most application programmers, the byte orderings used by their machines
are totally invisible. Programs compiled for either class of machine give identical
results. At times, however, byte ordering becomes an issue. The first is when
binary data is communicated over a network between different machines.
A common problem is for data produced by a little-endian machine to be sent
to a big-endian machine, or vice-versa, leading to the bytes within the words
being in reverse order for the receiving program. To avoid such problems,
code written for networking applications must follow established conventions
for byte ordering to make sure the sending machine converts its internal
representation to the network standard, while the receiving machine converts
the network standard to its internal representation.
能看懂,翻不出来.
[此贴子已经被作者于2006-3-4 16:27:11编辑过]
小译了一下,不正之处请多多包涵。。。
正文:
对于程序中的那些需要占据多个字节(下面就保留 byte ,还有 muiltibyte 这样较明显)的对象,我们必须对他们作出一些约定:1、对象的地址是什么? 2、构成该对象的 bytes 在内存中又是按照何种顺序排列的呢?
实际上在大多数的计算机里,muiltibyte(占据内存多个字节) 的对象在内存中是按照连续的byte排列的,对象的地址就是这些byte中的最小的那个byte(注:因为在内存中是给每个byte分配一个独立的地址的)的地址。举个例子:int x;(注:现在大多数计算机int是占4个byte),如果对象x的地址为Ox100,即 &x==Ox100,那么x在内存中的bytes的地址将依次为0x100, 0x101, 0x102, 和 0x103。
既然有序的byte可以表示一个对象,那么通常也就有2种约定(其实还有种middle-endian):例如一个w-位的整数:[xw-1,xw-2,......,x1,x0], 其中xw-1 是最大位,x0 是最小位。假设w是8的倍数,那么该整数可以被分为多个byte,最大byte由[xw-1,xw-2,......,xw-8]构成, 而最小byte由[x7, x6,......,x0]构成,其他的byte就介于两者之间。 有些计算机选择 最小byte到最大byte 的约定来储存,而其他的计算机则采用相反的约定。
前者的被称为:little-endian,这种机制被Digital Equipment Corporation公司(现已并入Compaq)以及Intel的大多数计算机所采用;后者被称为:big-endian,被IBM, Motorola, and Sun Microsystems的大多数计算机所采用。
注意这里说的是“大多数”,因为这些约定并没有给公司“分界”。比如IBM生产的个人电脑由于使用了Intel兼容的微处理器,就采用了 little-endian 机制。许多微处理器芯片,比如Alpha 和 Motorola的PowerPC可以兼容这两种模式,只有当芯片被power up(供电???)时才确定使用哪种模式。
-----------------------------------------------
底下的有点罗嗦了,我大概说一下:
比如一个int:假设地址为Ox100,值为:Ox01234567(十六进制),那么最大byte:Ox01,最小byte:Ox67(根据前文显而易见)。
那么显然:
Big endian(从最大byte到最小byte)
0x100 0x101 0x102 0x103
01 23 45 67
Little endian(从最小byte到最大byte)
0x100 0x101 0x102 0x103
67 45 23 01.
下面讲了“little endian” and “big endian”名字的由来…………………………最后得出结论,没有孰优孰劣之分,只要选个做标准,就是好的。
这2种约定对于我们一般的程序员(主要是写应用程序的)没什么影响。但有时候,比如网络编程,计算机A与计算机B通信时麻烦就来了,假如A用little-endian ,而B用big-endian,显然数据就差大了……那么如何解决呢?可以A采用它的内部机制,而到网络上转换成一种网络标准,到B上再从网络标准换成B的机制,这样就OK了。