ARM Unity开发者图形优化指南总结

优化列表

应用层:

1.用 OnBecameVisible() 和OnBecameInvisible() 来开关高cpu消耗的物体,或者该物体的部分高消耗逻辑代码

2.用 Vector3.sqrMagnitude 来替代 Vector3.Distance() 或者 Vector3.magnitude.

Vector3.sqrMagnitude sums the squared components without calculating the root, but this is useful for comparisons. The other calls use a computationally expensive square root.

3.如果已知数组大小,也没有动态添加的需求,就使用数组,不用 ArrayList 和List

ArrayList and List classes have more flexibility because they grow in size the more elements you insert, but they are slower than the built-in arrays.

其他众所周知的就不写了:

https://developer.arm.com/docs/100140/latest/optimization-lists/application-processor-optimizations

 

GPU层:

1.使用静态合批和动态合批来减少DC,从而减少CPU时间

Unity合批规则:https://docs.unity3d.com/Manual/DrawCallBatching.html

2.对于Mali GPU来说 用4XMSAA消耗最低

4x Multi-Sampling Anti-Aliasing (MSAA) with minimal performance drop

3.使用LOD

4.避免昂贵的运算指令(节省ALU)

5.使用ASTC压缩格式

6.开启MipMap(减少带宽,增加缓存命中率,减少摩尔纹等artifacts)

UI图片不用开启mipmap:

You do not usually require mipmapping for textures used in a 2D UI. UI textures are typically rendered on screen without scaling so they only use the first level in the mipmap chain.

7.使用遮挡剔除

8.使用EarlyZ

Mali-T600以后支持EarlyZ

9.使用ZPrePass,,可以同EarlyZ一起

 

资源优化:

1. Disable Read/Write for static textures

2. Combine meshes 来减少 DC(Mesh Baker等工具)

To reduce the number of draw calls required for rendering you can combine several meshes into one with the Mesh.CombineMeshes() method. If the meshes all share the same material, set the mergeSubMeshes argument to true so it generates a single submesh out of each mesh in the combine group.

好处:

  • Create more effective occluders.

  • Turn tile-based assets into a single large seamless solid asset.

3. Do not import animations data on FBX mesh models that do not animate

you can set the Animation Type to None in the Rig tab of the import settings. If this is set, placing your mesh into the hierarchy Unity does not generate an unused animator component.

4. Avoid Read/Write meshes

会在内存中保存一个mesh的副本,增加内存

5. Use texture atlases(复用材质球减少DC)

 

Shader优化:(此处主要讲用Mali Offline Shader Compiler进行优化)

1.用法:

可以看到shader的cycle数量

https://developer.arm.com/docs/100140/latest/optimization-lists/optimizing-with-the-mali-offline-shader-compiler/measuring-unity-shaders

2.优化Arithmetic pipeline

a.不使用cycle过多的指令

  • Avoid using complex arithmetic such as:

    • The inverse matrix function.

    • Modulo operators.

    • Division.

    • Determinant.

    • Sine.

    • Cosine.

b.整数变量用位移计算来代替除法,乘法和mod计算

c. 对正交矩阵用转置代替逆矩阵 Use transpose instead of inverse for orthogonal matrices.

d.避免使用转置,例如:Transpose(A)*Vector == Vector * A.

e.矩阵作为参数从cpu端传入,而不是在shader中计算, Pass matrices as uniforms instead of computing them. This uses the Load/Store pipeline.

f. Use a texture to store a set of precomputed values that represent a function such as sine or cosine. This moves the load to the Texture pipeline.(但是会增加采样,看情况取舍)

3.优化 Load/Store pipeline

The Load/Store pipeline is used for reading uniforms, writing varyings, and accessing buffers in the shaders such as Uniform Buffer Objects or Shader Storage Buffer Objects.

如果存在Load/Store pipeline bound:

a. Use a texture instead of a buffer object to read data in the shader(????)

b. Compute data using arithmetic operations

c. Compress or reduce uniforms and varyings

4.优化 Texture Pipline(主要是减少带宽,增加缓存命中率)

a.开启mipmap

b.压缩贴图

c.避免 Trilinear 和anisotropic filtering

5.Use world space normal maps for static objects

静态物体可以用世界座标法线贴图,节省一次法线从tangent空间到世界座标系的变换,目的是减少PS的ALU消耗

 

避免寄存器溢出

寄存器溢出的成因:

Uniform/Varying变量和临时变量过多

变量精度过高

寄存器溢出产生的问题:

寄存器溢出会使Mali GPU在内存中读取Uniform,会影响Load/Store的性能,增加带宽,降低cache 命中率

Mali Offline Shader Compiler可以发现问题

解决办法:

1.减少varying和uniform变量的精度(uniform变量是外部application程序传递给(vertex和fragment)shader的变量,varying变量是vertex和fragment shader之间做数据传递用的,attribute变量是只能在vertex shader中使用的变量)

减少精度到half的好处:

a.减少带宽

b.由于减少了精度,shader编译器优化代码会更加并行化(编译速度会加快???)

c.使用到的uniform寄存器会减少,降低了寄存器溢出的风险

d.Load/Store指令数量也会减少

e.生成的代码也比float要小,增加了cache命中率,提升了性能

 

使用Vulkan

好处:

1.减少DC需要的时间

2.减少电量消耗

 

总结:

虽然是以2015年的Demo为例的建议,有点过时,但是也有许多值得借鉴的方案

https://developer.arm.com/docs/100140/0303

------by wolf96 2019/6/4

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章