Int64針對32位架構是按照4字節還是8字節對齊?

作爲構建.NET的標準,CLI Spec(ECMA-335)針對基元類型的對齊規則具有如下的描述。按照這個標準,我們是這麼理解的:8字節的數據類型(int64、unsigned int64和float64)根據採用的機器指令架構選擇4字節或者8字節對其。進一步來說,它們在x86/x64機器上的對其字節分別爲4字節和8字節。

Built-in data types shall be properly aligned, which is defined as follows:

  • 1-byte, 2-byte, and 4-byte data is properly aligned when it is stored at a 1-byte, 2-byte, or 4-byte boundary, respectively.

  • 8-byte data is properly aligned when it is stored on the same boundary required by the underlying hardware for atomic access to a native int.

Thus, int16 and unsigned int16 start on even address; int32, unsigned int32, and float32 start on an address divisible by 4; and int64, unsigned int64, and float64 start on an address divisible by 4 or 8, depending upon the target architecture. The native size types (native int, native unsigned int, and &) are always naturally aligned (4 bytes or 8 bytes, depending on the architecture). When generated externally, these should also be aligned to their natural size, although portable code can use 8-byte alignment to guarantee architecture independence. It is strongly recommended that float64 be aligned on an 8-byte boundary, even when the size of native int is 32 bits.

我們通過一個簡單控制檯程序來驗證這個說法。爲了在64位機器上模擬32位平臺,我們按照如下的方式修改了.csproj文件,將PlatformTarget屬性設置爲x86(默認爲Any CPU)。

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <AllowUnsafeBlocks>True</AllowUnsafeBlocks>
    <PlatformTarget>x86</PlatformTarget>
  </PropertyGroup>
</Project>

在演示程序中,我們定義瞭如下一個名爲Foobar的結構體Record。該結構體具有兩個字段,類型分別爲byte和ulong(unsigned int64)。我們將這兩個字段分別設置爲byte.Max(FF)和ulong.MaxValue(FF-FF-FF-FF-FF-FF-FF-FF-FF),並將在內存中的二進制形式輸出來。爲了進一步確定當前的環境與CLI Spec的描述一致,我們將Environment.Is64BitProcess屬性(確定是不是64位處理器),ulong類型的字節數(確定這是一個”8-byte data”)和IntPtr.Size(確定native int類型的對其邊界是4字節)。

unsafe
{
    var bytes = new byte[sizeof(Foobar)];
    var foobar = new Foobar(byte.MaxValue, ulong.MaxValue);
    Marshal.Copy(new nint(Unsafe.AsPointer(ref foobar)), bytes, 0, bytes.Length);
    Console.WriteLine(BitConverter.ToString(bytes));
    Console.WriteLine($"Environment.Is64BitProcess = {Environment.Is64BitProcess}");
    Console.WriteLine($"sizeof(ulong) = {sizeof(ulong)}");
    Console.WriteLine($"IntPtr.Size = {IntPtr.Size}");
}

public record struct Foobar(byte Foo, ulong Bar);

從如下的輸出可以看出,當前的環境與CLI Spec描述的32位處理器架構是一致的,但是ulong類型的字段Bar採用的對其長度是8字節而不是4字節(如果採用4字節對其的話,二進制形式應該FF-00-00-00-FF-FF-FF-FF-FF-FF-FF-FF-FF,如果保證Foobar自身按照8字節對齊,結果也應該是FF-00-00-00-FF-FF-FF-FF-FF-FF-FF-FF-FF-00-00-00-00)。

image

對於這個問題,我們目前尚未找到一個權威的答案,莫不是我對CLI Spec的解讀有誤?還是我們的驗證程序有問題?希望對此熟悉的朋友不吝賜教!我們目前Google如下這些相關的說法:

Memory alignment on a 32-bit Intel processor

The usual rule of thumb (straight from Intels and AMD's optimization manuals) is that every data type should be aligned by its own size. An int32 should be aligned on a 32-bit boundary, an int64 on a 64-bit boundary, and so on. A char will fit just fine anywhere.

Another rule of thumb is, of course "the compiler has been told about alignment requirements". You don't need to worry about it because the compiler knows to add the right padding and offsets to allow efficient access to data.

WHY IS THE DEFAULT ALIGNMENT FOR `INT64_T` 8 BYTE ON 32 BIT X86 ARCHITECTURE?

Interesting point: If you only ever load it as two halves into 32bit GP registers, then 4B alignment means those operations will happen with their natural alignment.

However, it's probably best if both halves of the variable are in the same cache line, since almost all accesses will read / write both halves. Aligning to the natural alignment of the whole thing takes care of that, even ignoring the other reasons below.

32bit x86 can load 64bit integers in a single 64bit-load using MMX or SSE2 movq. Handling 64bit add/sub/shift/ and bitwise booleans using vector instructions is more efficient (single instruction), as long as you don't need immediate constants or mul or div. The vector instructions with 64b elements are still available in 32b mode.

Atomic 64bit compare-and-exchange is also available in 32bit mode (lock CMPXCHG8B m64 works just like 64bit mode's lock CMPXCHG16B m128, using two implicit registers (edx:eax)). IDK what kind of penalty it has for crossing a cache-line boundary.

Modern x86 CPUs have essentially no penalty for misaligned loads/stores unless they cross cache-line boundaries, which is why I'm only saying that, and not saying that misaligned 64b would be bad in general. See the links in the x86 wiki, esp. Agner Fog's guides.

Why is the "alignment" the same on 32-bit and 64-bit systems?

MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)

做了如下的補充實驗,證明ulong類型的對齊規則確實與CLI Spec一致的。莫非8-byte 數據類型本身和作爲符合類型(struct/class)字段成員時採用不同的對齊規則?

x64:如下的斷言總是成立的。

var random = new Random(); unsafe { long v = random.NextInt64(); Debug.Assert(new IntPtr(Unsafe.AsPointer(ref v)).ToInt64() % 8 == 0);

}

x86:如下的斷言也總是成立的

var random = new Random(); unsafe { long v = random.NextInt64(); Debug.Assert(new IntPtr(Unsafe.AsPointer(ref v)).ToInt32() % 4 == 0);

}

x86:如下的斷言就不能保證都成立

var random = new Random(); unsafe { long v = random.NextInt64(); Debug.Assert(new IntPtr(Unsafe.AsPointer(ref v)).ToInt32() % 8 == 0);

}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章