在OpenGL中,當我們創建resource的時候,內存會被自動分配。
不同於OpenGL,vulkan是更加底層的API,需要顯式的內存管理。顯式的內存管理可以在資源複用與特定平臺的優化方面帶來好處。
1.Vulkan 內存分類
Vulkan內存分爲2類:Host memory和Device memory。
- Device memory:指顯存(GPU可直接訪問,速度快)
- Host memory:指系統內存,即CPU可以直接訪問的內存。
1.1 Host Memory
Host Memory是vulkan implementation需要的,對device不可見的存儲。這類memory用來存儲vulkan object的狀態以及實現。
Vulkan爲application提供了代替vulkan實現來進行host memory allocation的選項。如果該feature沒有被使用到,vulkan實現將會使用自己的memory allocation方法。鑑於大多數memory allocation都不在關鍵路徑上,因此這個feature並不是一個performance feature。相對的,這對於嵌入式系統中的debug或者memory allocation logging等目的是挺有用的。
Application可以把Allocators以指針的方式提交給VkAllocationCallbacks結構體:
// Provided by VK_VERSION_1_0 typedef struct VkAllocationCallbacks { void* pUserData; PFN_vkAllocationFunction pfnAllocation; PFN_vkReallocationFunction pfnReallocation; PFN_vkFreeFunction pfnFree; PFN_vkInternalAllocationNotification pfnInternalAllocation; PFN_vkInternalFreeNotification pfnInternalFree; } VkAllocationCallbacks;
1.2 Device Memory
Device Memory是device(GPU)可見的memory,即顯存。舉例:image的內容或者buffer對象,這些可以被device原生地(natively)使用。
Device Memory Properties
Physical device的memory properties描述了memory heaps和可用的memory類型。對於不同的場景,我們需要指定不同的memory properties來使得內存訪問效率最大化,後文會介紹不同memory property的特點。
可以通過調用下列函數查詢memory properties:
// Provided by VK_VERSION_1_0 void vkGetPhysicalDeviceMemoryProperties( VkPhysicalDevice physicalDevice, VkPhysicalDeviceMemoryProperties* pMemoryProperties);
- physicalDevice是要查詢的device的handle
- pMemoryProperties裏是返回的properties
VkPhysicalDeviceMemoryProperties結構提定義:
// Provided by VK_VERSION_1_0 typedef struct VkPhysicalDeviceMemoryProperties { uint32_t memoryTypeCount; VkMemoryType memoryTypes[VK_MAX_MEMORY_TYPES]; uint32_t memoryHeapCount; VkMemoryHeap memoryHeaps[VK_MAX_MEMORY_HEAPS]; } VkPhysicalDeviceMemoryProperties;
- memoryTypeCount: memoryTypes數組中有效元素的個數。
- memoryTypes: 描述了用來訪問堆分配內存的memory類型。
- memoryHeapCount: memoryHeaps中有效元素的個數
- memoryHeaps: 描述可以被分配的memory heaps的數組
VkPhysicalDeviceMemoryProperties結構描述了多個內存堆以及可用於訪問在這些堆中分配的內存的多個內存類型。每個堆描述特定大小的內存資源,每個內存類型描述可與給定內存堆一起使用的一組內存屬性(例如,host cached vs uncached)。使用特定內存類型的分配將消耗該內存類型的堆索引所指示的堆中的資源。多於一種的內存類型可以共享每個堆,並且堆和存內存類型提供了一種機制來通告物理內存資源的準確大小,同時允許內存以各種不同的屬性一起使用。
至少有一個堆必須在VkMemoryHeap::flags中包括VK_MEMORY_heap_DEVICE_LOCAL_BIT。如果有多個堆都具有相似的性能特性,則它們可以都包括VK_MEMORY_HEAP_DEVICE_LOCAL_BIT。在統一內存體系結構(UMA)系統中,通常只有一個內存堆,它被認爲是host和device的同等“本地”內存堆,並且這樣的實現必須將該堆宣傳爲設備本地內存堆。
2. Memory Heaps與types
Heaps: Physical memory,描述了memory的物理屬性,Heap限制了Type flags的值
Types:描述了內存的屬性
2.1 常用Memory Heaps解析
Device 端的內存堆及其內存類型可以通過vkGetPhysicalDeviceMemoryProperties 查詢 以AMD 獨立顯卡爲例,可查詢的堆類型如下
//Heap 0 VK_MEMORY_HEAP_DEVICE_LOCAL_BIT 代表GPU 設備上的顯存,並且不能被映射到CPU 端 MemoryType0 VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT GPU 讀寫以及原子操作速度最快 不能用vkMapMemory 映射到CPU 端 //Heap 1 VK_MEMORY_HEAP_DEVICE_LOCAL_BIT 代表GPU 設備上且被CPU訪問的顯存 (通常來說GPU 會預留一小部分顯存直接供CPU 訪問) MemoryType1 VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT VK_MEMORY_PROPERTY_HOST_COHERENT_BIT 可以通過vkMapMemory訪問 GPU 讀寫速度及原子操作速度最快 CPU 端訪問是uncache,CPU寫入是write-combine,讀取時uncache模式 //Heap 2 表示主機系統內存,並可以被GPU訪問 可以使用vkMapMemory GPU 讀取buffer 和 texture 通常被GPU L2 緩存 GPU L2 missing 會導致PCIE 讀 系統內存 GPU L2 missing 延遲較高 MemoryType2 VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT VK_MEMORY_PROPERTY_HOST_COHERENT_BIT CPU 寫是write combine,讀是uncache MemoryType3 VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT VK_MEMORY_PROPERTY_HOST_COHERENT_BIT VK_MEMORY_PROPERTY_HOST_CACHED_BIT CPU 讀寫可以通過 CPU cache GPU 讀通過snoop CPU cache,snoop 是一種硬件管理的Cache 一致性處理方式
2.2 常用Memory Types解析
- VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT:表示以此flag分配的內存,對device access來說是最高效的。需要注意的是僅當heap的flag爲VK_MEMORY_HEAP_DEVICE_LOCAL_BIT時纔可以設置,Device-Local Memory就是指顯存;
由顯存分配,可以被GPU快速訪問,不能直接被CPU訪問,內存Map也不行;
GPU讀/寫都是最快的;
適用場景爲CPU寫一次,GPU經常讀/寫的場景。
由於Device-Local Memory無法被CPU訪問,因此CPU寫時需要先寫到一個Host可見的Memory,然後再通過指令拷貝到對應的Device Memory。我們一般把這個持有Host可見的Memory的buffer稱爲Staging Buffer。關於staging buffer的介紹和使用可以參考:Staging Buffer
- VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT:表示以此flag分配的內存,可以通過調用vkMapMemory函數map給CPU訪問。
由系統內存分配,可以Map,可以同時被CPU和GPU訪問(kernel中應該進行了實現,即map的方式),顯而易見的是GPU的訪問比較慢(需要通過總線)
適用場景爲CPU寫數據(多次),GPU讀一次(僅讀一次,不用擔心CPU寫數據的過程中,GPU Cache可能帶來的數據污染);或者大量數據被GPU讀取。
- VK_MEMORY_PROPERTY_HOST_COHERENT_BIT:該標誌位表示CPU的write數據flush給GPU時,無需調用vkFlushMappedMemoryRanges;GPU的write的數據想要對CPU可見時,無需調用vkInvalidateMappedMemoryRanges。
解析:
簡化模型:
+-------------------------+ +-----------------------+ | GPU |GPUCache/LocalMemory | | HostMemory/CPUCache |CPU | | | | | | | +-------------------------+ +-----------------------+ | | | PCIE bus | +-----------------------------------------------------+
有Cache 就存在cache 不一致問題。這裏考慮的都是CPU Memory的cache一致性問題。對於GPU cache如何操作去解決一致性問題,並未提及。
CPU的write數據是暫存在CPU cache中,如果要刷給GPU,就需要手動調用vkFlushMappedMemoryRanges使數據從CPU Cache刷到memory中,這樣GPU才能看到這個數據。
GPU的write數據是暫存在GPU cache中,刷到memory中如果要被CPU看到,則需要invalidate CPU cache,就是讓CPU cache中相對應緩存的對應數據失效,只有這樣CPU纔會從memory中取拿最新的數據;並且可以避免CPU cache中的過時數據被擠到memory中。
這裏提到的vkFlush和vkInvalidate都是對CPU cache的操作。沒提到的GPU cache的flush和invalidate,是GPU自己控制的,一般是flush會被自動加入到command line中,並且flush後會隱式調用invalidate。
- VK_MEMORY_PROPERTY_HOST_CACHED_BIT:該標誌位表示以此flag分配的內存是CPU cached的;CPU對uncached memory的訪問速度要比cached memory要慢;然而uncached memory總是host coherent的,這個也很好理解,沒有CPU cache,寫和讀都會直接從memory裏操作,也就無需再調用vkFlushMappedMemoryRanges和vkInvalidateMappedMemoryRanges。
- VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_CACHED_BIT:兼顧兩種屬性,即具備cpu cache,同時可以被CPU和GPU訪問,適合GPU寫,CPU讀取。
2.3 vulkan memory property Flags
vulkan propertyFlags有如下值
0 VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT VK_MEMORY_PROPERTY_PROTECTED_BIT VK_MEMORY_PROPERTY_PROTECTED_BIT | VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_RDMA_CAPABLE_BIT_NV
3. Device Memory的分配與申請流程
3.1 Memory Support查詢
編程實踐中,第一步就是查看我們需要的內存類型是否時當前硬件/軟件環境支持的,Application通過如下方式查找所需的內存類型是否在支持列表中:
// Find a memory in `memoryTypeBitsRequirement` that includes all of `requiredProperties` int32_t findProperties(const VkPhysicalDeviceMemoryProperties* pMemoryProperties, uint32_t memoryTypeBitsRequirement, VkMemoryPropertyFlags requiredProperties) { const uint32_t memoryCount = pMemoryProperties->memoryTypeCount; for (uint32_t memoryIndex = 0; memoryIndex < memoryCount; ++memoryIndex) { const uint32_t memoryTypeBits = (1 << memoryIndex); const bool isRequiredMemoryType = memoryTypeBitsRequirement & memoryTypeBits; const VkMemoryPropertyFlags properties = pMemoryProperties->memoryTypes[memoryIndex].propertyFlags; const bool hasRequiredProperties = (properties & requiredProperties) == requiredProperties; if (isRequiredMemoryType && hasRequiredProperties) return static_cast<int32_t>(memoryIndex); } // failed to find memory type return -1; } // Try to find an optimal memory type, or if it does not exist try fallback memory type // `device` is the VkDevice // `image` is the VkImage that requires memory to be bound // `memoryProperties` properties as returned by vkGetPhysicalDeviceMemoryProperties // `requiredProperties` are the property flags that must be present // `optimalProperties` are the property flags that are preferred by the application VkMemoryRequirements memoryRequirements; vkGetImageMemoryRequirements(device, image, &memoryRequirements); int32_t memoryType = findProperties(&memoryProperties, memoryRequirements.memoryTypeBits, optimalProperties); if (memoryType == -1) // not found; try fallback properties memoryType = findProperties(&memoryProperties, memoryRequirements.memoryTypeBits, requiredProperties);
3.2 Memory Allocation
3.2.1 Direct Device Memory Allocation
allocate memory objects時,調用
// Provided by VK_VERSION_1_0 VkResult vkAllocateMemory( VkDevice device, const VkMemoryAllocateInfo* pAllocateInfo, const VkAllocationCallbacks* pAllocator, VkDeviceMemory* pMemory); //device is the logical device that owns the memory. //pAllocateInfo is a pointer to a VkMemoryAllocateInfo structure describing parameters of the allocation. A successfully returned allocation must use the requested parameters — no substitution is permitted by the implementation. //pAllocator controls host memory allocation as described in the Memory Allocation chapter. //pMemory is a pointer to a VkDeviceMemory handle in which information about the allocated memory is returned. //The VkMemoryAllocateInfo structure 定義: // Provided by VK_VERSION_1_0 typedef struct VkMemoryAllocateInfo { VkStructureType sType; const void* pNext; VkDeviceSize allocationSize; uint32_t memoryTypeIndex; } VkMemoryAllocateInfo; //sType is a VkStructureType value identifying this structure. //pNext is NULL or a pointer to a structure extending this structure. //allocationSize is the size of the allocation in bytes. //memoryTypeIndex is an index identifying a memory type from the memoryTypes array of the VkPhysicalDeviceMemoryProperties structure.
3.2.2 Android Hardware Buffer External Memory
當我們需要從在vulkan instance之外創建的Android hardware buffer來import memory時,可以在VkMemoryAllocateInfo結構體的pNext加一個VkImportAndroidHardwareBufferInfoANDROID
結構體,定義如下:
// Provided by VK_ANDROID_external_memory_android_hardware_buffer typedef struct VkImportAndroidHardwareBufferInfoANDROID { VkStructureType sType; const void* pNext; struct AHardwareBuffer* buffer; } VkImportAndroidHardwareBufferInfoANDROID; //sType is a VkStructureType value identifying this structure. //pNext is NULL or a pointer to a structure extending this structure. //buffer is the Android hardware buffer to import. 當需要把一個vulkan device memory對象聲明爲Android hardware buffer引用時,調用 // Provided by VK_ANDROID_external_memory_android_hardware_buffer VkResult vkGetMemoryAndroidHardwareBufferANDROID( VkDevice device, const VkMemoryGetAndroidHardwareBufferInfoANDROID* pInfo, struct AHardwareBuffer** pBuffer); //device is the logical device that created the device memory being exported. //pInfo is a pointer to a VkMemoryGetAndroidHardwareBufferInfoANDROID structure containing parameters of the export operation. //pBuffer will return an Android hardware buffer referencing the payload of the device memory object.
3.3 Vulkan Memory申請流程
Step1: 獲取physical device支持的memory屬性
Step2:根據申請的resource的類型獲取這塊memory的requirements
Step3: 遍歷Step1與Step2的結果,看當前physical device是否支持當前resource的memory requirement
Step4: 根據memory requirement設置memory allocation info
Step5: 調用vkAllocateMemory申請內存
Step6: 調用vkBindImageMemory把申請的內存memory和resource image綁定在一起
4. VkMemoryPropertyFlagBits描述
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkMemoryPropertyFlagBits.html
5. Vulkan的緩衝(Buffer)與圖像(Image)
解決了內存分配的問題,但目前仍還有一個巨大的問題等待着我們去解決:GPU繪製需要各種資源,但資源通常是存儲在CPU內存中的,和GPU內存並不互通,無法被GPU直接訪問,因此我們需要一個方法把資源放到GPU內存中而且能被GPU按照一定的規矩訪問,而不是亂來,那麼接下來我們就來解決這個問題。
Vulkan爲我們提供了兩種不同的資源類型,分別是緩衝(Buffer)和圖像(Image),這兩個都是vulkan中的resource。在利用相應的vulkan API 創建完VkBuffer或者VkImage之後,就可以遵循上文3.3 Vulkan Memory申請流程進行memory的申請和resource/memory綁定了。
參考鏈接
- Vulkan Memory Management:https://www.youtube.com/watch?v=gM93bbKQ0P8
- https://vulkan-tutorial.com/Vertex_buffers/Vertex_buffer_creation
- 從零開始的Vulkan(三):資源與內存管理 https://zhuanlan.zhihu.com/p/537142901?utm_id=0
- Vulkan內存屬性解析 https://zhuanlan.zhihu.com/p/527481097