ARM Unity開發者圖形優化指南總結

優化列表

應用層:

1.用 OnBecameVisible() 和OnBecameInvisible() 來開關高cpu消耗的物體,或者該物體的部分高消耗邏輯代碼

2.用 Vector3.sqrMagnitude 來替代 Vector3.Distance() 或者 Vector3.magnitude.

Vector3.sqrMagnitude sums the squared components without calculating the root, but this is useful for comparisons. The other calls use a computationally expensive square root.

3.如果已知數組大小,也沒有動態添加的需求,就使用數組,不用 ArrayList 和List

ArrayList and List classes have more flexibility because they grow in size the more elements you insert, but they are slower than the built-in arrays.

其他衆所周知的就不寫了:

https://developer.arm.com/docs/100140/latest/optimization-lists/application-processor-optimizations

 

GPU層:

1.使用靜態合批和動態合批來減少DC,從而減少CPU時間

Unity合批規則:https://docs.unity3d.com/Manual/DrawCallBatching.html

2.對於Mali GPU來說 用4XMSAA消耗最低

4x Multi-Sampling Anti-Aliasing (MSAA) with minimal performance drop

3.使用LOD

4.避免昂貴的運算指令(節省ALU)

5.使用ASTC壓縮格式

6.開啓MipMap(減少帶寬,增加緩存命中率,減少摩爾紋等artifacts)

UI圖片不用開啓mipmap:

You do not usually require mipmapping for textures used in a 2D UI. UI textures are typically rendered on screen without scaling so they only use the first level in the mipmap chain.

7.使用遮擋剔除

8.使用EarlyZ

Mali-T600以後支持EarlyZ

9.使用ZPrePass,,可以同EarlyZ一起

 

資源優化:

1. Disable Read/Write for static textures

2. Combine meshes 來減少 DC(Mesh Baker等工具)

To reduce the number of draw calls required for rendering you can combine several meshes into one with the Mesh.CombineMeshes() method. If the meshes all share the same material, set the mergeSubMeshes argument to true so it generates a single submesh out of each mesh in the combine group.

好處:

  • Create more effective occluders.

  • Turn tile-based assets into a single large seamless solid asset.

3. Do not import animations data on FBX mesh models that do not animate

you can set the Animation Type to None in the Rig tab of the import settings. If this is set, placing your mesh into the hierarchy Unity does not generate an unused animator component.

4. Avoid Read/Write meshes

會在內存中保存一個mesh的副本,增加內存

5. Use texture atlases(複用材質球減少DC)

 

Shader優化:(此處主要講用Mali Offline Shader Compiler進行優化)

1.用法:

可以看到shader的cycle數量

https://developer.arm.com/docs/100140/latest/optimization-lists/optimizing-with-the-mali-offline-shader-compiler/measuring-unity-shaders

2.優化Arithmetic pipeline

a.不使用cycle過多的指令

  • Avoid using complex arithmetic such as:

    • The inverse matrix function.

    • Modulo operators.

    • Division.

    • Determinant.

    • Sine.

    • Cosine.

b.整數變量用位移計算來代替除法,乘法和mod計算

c. 對正交矩陣用轉置代替逆矩陣 Use transpose instead of inverse for orthogonal matrices.

d.避免使用轉置,例如:Transpose(A)*Vector == Vector * A.

e.矩陣作爲參數從cpu端傳入,而不是在shader中計算, Pass matrices as uniforms instead of computing them. This uses the Load/Store pipeline.

f. Use a texture to store a set of precomputed values that represent a function such as sine or cosine. This moves the load to the Texture pipeline.(但是會增加採樣,看情況取捨)

3.優化 Load/Store pipeline

The Load/Store pipeline is used for reading uniforms, writing varyings, and accessing buffers in the shaders such as Uniform Buffer Objects or Shader Storage Buffer Objects.

如果存在Load/Store pipeline bound:

a. Use a texture instead of a buffer object to read data in the shader(????)

b. Compute data using arithmetic operations

c. Compress or reduce uniforms and varyings

4.優化 Texture Pipline(主要是減少帶寬,增加緩存命中率)

a.開啓mipmap

b.壓縮貼圖

c.避免 Trilinear 和anisotropic filtering

5.Use world space normal maps for static objects

靜態物體可以用世界座標法線貼圖,節省一次法線從tangent空間到世界座標系的變換,目的是減少PS的ALU消耗

 

避免寄存器溢出

寄存器溢出的成因:

Uniform/Varying變量和臨時變量過多

變量精度過高

寄存器溢出產生的問題:

寄存器溢出會使Mali GPU在內存中讀取Uniform,會影響Load/Store的性能,增加帶寬,降低cache 命中率

Mali Offline Shader Compiler可以發現問題

解決辦法:

1.減少varying和uniform變量的精度(uniform變量是外部application程序傳遞給(vertex和fragment)shader的變量,varying變量是vertex和fragment shader之間做數據傳遞用的,attribute變量是只能在vertex shader中使用的變量)

減少精度到half的好處:

a.減少帶寬

b.由於減少了精度,shader編譯器優化代碼會更加並行化(編譯速度會加快???)

c.使用到的uniform寄存器會減少,降低了寄存器溢出的風險

d.Load/Store指令數量也會減少

e.生成的代碼也比float要小,增加了cache命中率,提升了性能

 

使用Vulkan

好處:

1.減少DC需要的時間

2.減少電量消耗

 

總結:

雖然是以2015年的Demo爲例的建議,有點過時,但是也有許多值得借鑑的方案

https://developer.arm.com/docs/100140/0303

------by wolf96 2019/6/4

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章