來自 2022 WebGL & WebGPU Meetup 的幻燈片，文末有資料。

1 在能用的地方都用 label 屬性

WebGPU 中的每個對象都有 label 屬性，不管你是創建它的時候通過傳遞 descriptor 的 label 屬性也好，亦或者是創建完成後直接訪問其 label 屬性也好。這個屬性類似於一個 id，它能讓對象更便於調試和觀察，寫它幾乎不需要什麼成本考量，但是調試的時候會非常、非常爽。

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer',
  size: 12 * Float32Array.BYTES_PER_ELEMENT, // 故意設的 12，實際上矩陣應該要 16
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
})
const projectionMatrixArray = new Float32Array(16)

gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray)

上面代碼故意寫錯的矩陣所用 GPUBuffer 的大小，在錯誤校驗的時候就會帶上 label 信息了：

// 控制檯輸出
Write range (bufferOffset: 0, size: 64) does not fit in [Buffer "Projection Matrix Buffer"] size (48).

2 使用調試組

指令緩衝（CommandBuffer）允許你增刪調試組，調試組其實就是一組字符串，它指示的是哪部分代碼在執行。錯誤校驗的時候，報錯消息會顯示調用堆棧：

// --- 第一個調試點：標記當前幀 ---
commandEncoder.pushDebugGroup('Frame ${frameIndex}');
  // --- 第一個子調試點：標記燈光的更新 ---
  commandEncoder.pushDebugGroup('Clustered Light Compute Pass');
		// 譬如，在這裏更新光源
    updateClusteredLights(commandEncoder);
  commandEncoder.popDebugGroup();
  // --- 結束第一個子調試點 ---
  // --- 第二個子調試點：標記渲染通道開始 ---
  commandEncoder.pushDebugGroup('Main Render Pass');
    // 觸發繪製
    renderScene(commandEncoder);
  commandEncoder.popDebugGroup();
  // --- 結束第二個子調試點
commandEncoder.popDebugGroup();
// --- 結束第一個調試點 ---

這樣，如果有報錯消息，就會提示：

// 控制檯輸出
Binding sizes are too small for bind group [BindGroup] at index 0

Debug group stack:
> "Main Render Pass"
> "Frame 234"

3 從 Blob 中載入紋理圖像

使用 Blob 創建的 ImageBitmaps 可以獲得最佳的 JPG/PNG 紋理解碼性能。

/**
 * 根據紋理圖片路徑異步創建紋理對象，並將紋理數據拷貝至對象中
 * @param {GPUDevice} gpuDevice 設備對象
 * @param {string} url 紋理圖片路徑
 */
async function createTextureFromImageUrl(gpuDevice, url) {
  const blob = await fetch(url).then((r) => r.blob())
  const source = await createImageBitmap(blob)
  
  const textureDescriptor = {
    label: `Image Texture ${url}`,
    size: {
      width: source.width,
      height: source.height,
    },
    format: 'rgba8unorm',
    usage: GPUTextureUsage.TEXTURE_BINDING | GPUTextureUsage.COPY_DST
  }
  const texture = gpuDevice.createTexture(textureDescriptor)
  gpuDevice.queue.copyExternalImageToTexture(
    { source },
    { texture },
    textureDescriptor.size,
  )
  
  return texture
}

更推薦使用壓縮格式的紋理資源

能用就用。

WebGPU 支持至少 3 種壓縮紋理類型：

texture-compression-bc
texture-compression-etc2
texture-compression-astc

支持多少是取決於硬件能力的，根據官方的討論（Github Issue 2083），全平臺都要支持 BC 格式（又名 DXT、S3TC），或者 ETC2、ASTC 壓縮格式，以保證你可以用紋理壓縮能力。

強烈推薦使用超壓縮紋理格式（例如 Basis Universal），好處是可以無視設備，它都能轉換到設備支持的格式上，這樣就避免準備兩種格式的紋理了。

原作者寫了個庫，用於在 WebGL 和 WebGPU 種加載壓縮紋理，參考 Github toji/web-texture-tool

WebGL 對壓縮紋理的支持不太好，現在 WebGPU 原生就支持，所以儘可能用吧！

4 使用 glTF 處理庫 gltf-transform

這是一個開源庫，你可以在 GitHub 上找到它，它提供了命令行工具。

譬如，你可以使用它來壓縮 glb 種的紋理：

> gltf-transform etc1s paddle.glb paddle2.glb
paddle.glb (11.92 MB) → paddle2.glb (1.73 MB)

做到了視覺無損，但是從 Blender 導出的這個模型的體積能小很多。原模型的紋理是 5 張 2048 x 2048 的 PNG 圖。

這庫除了壓縮紋理，還能縮放紋理，重採樣，給幾何數據附加 Google Draco 壓縮等諸多功能。最終優化下來，glb 的體積只是原來的 5% 不到。

> gltf-transform resize paddle.glb paddle2.glb --width 1024 --height 1024
> gltf-transform etc1s paddle2.glb paddle2.glb
> gltf-transform resample paddle2.glb paddle2.glb
> gltf-transform dedup paddle2.glb paddle2.glb
> gltf-transform draco paddle2.glb paddle2.glb

  paddle.glb (11.92 MB) → paddle2.glb (596.46 KB)

5 緩衝數據上載

WebGPU 中有很多種方式將數據傳入緩衝，writeBuffer() 方法不一定是錯誤用法。當你在 wasm 中調用 WebGPU 時，你應該優先考慮 writeBuffer() 這個 API，這樣就避免了額外的緩衝複製操作。

const projectionMatrixBuffer = gpuDevice.createBuffer({
  label: 'Projection Matrix Buffer',
  size: 16 * Float32Array.BYTES_PER_ELEMENT,
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});

// 當投影矩陣改變時（例如 window 改變了大小）
function updateProjectionMatrixBuffer(projectionMatrix) {
  const projectionMatrixArray = projectionMatrix.getAsFloat32Array();
  gpuDevice.queue.writeBuffer(projectionMatrixBuffer, 0, projectionMatrixArray);
}

原作者指出，創建 buffer 時設 mappedAtCreation 並不是必須的，有時候創建時不映射也是可以的，譬如對 glTF 中有關的緩衝加載。

6 推薦異步創建 pipeline

如果你不是馬上就要渲染管線或者計算管線，儘量用 createRenderPipelineAsync 和 createComputePipelineAsync 這倆 API 來替代同步創建。

同步創建 pipeline，有可能會在底層去把管線的有關資源進行編譯，這會中斷 GPU 有關的步驟。

而對於異步創建，pipeline 沒準備好就不會 resolve Promise，也就是說可以優先讓 GPU 當前在乾的事情先做完，再去折騰我所需要的管線。

下面看看對比代碼：

// 同步創建計算管線
const computePipeline = gpuDevice.createComputePipeline({/* ... */})

computePass.setPipeline(computePipeline)
computePass.dispatch(32, 32) // 此時觸發調度，着色器可能在編譯，會卡

再看看異步創建的代碼：

// 異步創建計算管線
const asyncComputePipeline = await gpuDevice.createComputePipelineAsync({/* ... */})

computePass.setPipeline(asyncComputePipeline)
computePass.dispatch(32, 32) // 這個時候着色器早已編譯好，沒有卡頓，棒棒噠

7 慎用隱式管線佈局

隱式管線佈局，尤其是獨立的計算管線，或許對寫 js 的時候很爽，但是這麼做會帶來倆潛在問題：

中斷共享資源綁定組
更新着色器時發生點奇怪的事情

如果你的情況特別簡單，可以使用隱式管線佈局，但是能用顯式創建管線佈局就顯式創建。

下面就是所謂的隱式管線佈局的創建方式，先創建的管線對象，而後調用管線的 getBindGroupLayout() API 推斷着色器代碼中所需的管線佈局對象。

const computePipeline = await gpuDevice.createComputePipelineAsync({
  // 不傳遞佈局對象
  compute: {
    module: computeModule,
    entryPoint: 'computeMain'
  }
})

const computeBindGroup = gpuDevice.createBindGroup({
  // 獲取隱式管線佈局對象
  layout: computePipeline.getBindGroupLayout(0),
  entries: [{
    binding: 0,
    resource: { buffer: storageBuffer },
  }]
})

7 共享資源綁定組與綁定組佈局對象

如果在渲染/計算過程中，有一些數值是不會變但是頻繁要用的，這種情況你可以創建一個簡單一點的資源綁定組佈局，可用於任意一個使用了同一號綁定組的管線對象上。

首先，創建資源綁定組及其佈局：

// 創建一個相機 UBO 的資源綁定組佈局及其綁定組本體
const cameraBindGroupLayout = device.createBindGroupLayout({
  label: `Camera uniforms BindGroupLayout`,
  entries: [{
    binding: 0,
    visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT,
    buffer: {},
  }]
})

const cameraBindGroup = gpu.device.createBindGroup({
  label: `Camera uniforms BindGroup`,
  layout: cameraBindGroupLayout,
  entries: [{
    binding: 0,
    resource: { buffer: cameraUniformsBuffer, },
  }],
})

隨後，創建兩條渲染管線，注意到這兩條管線都用到了兩個資源綁定組，有區別的地方就是用的材質資源綁定組是不一樣的，共用了相機資源綁定組：

const renderPipelineA = gpuDevice.createRenderPipeline({
  label: `Render Pipeline A`,
  layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutA]),
  /* Etc... */
});

const renderPipelineB = gpuDevice.createRenderPipeline({
  label: `Render Pipeline B`,
  layout: gpuDevice.createPipelineLayout([cameraBindGroupLayout, materialBindGroupLayoutB]),
  /* Etc... */
});

最後，在渲染循環的每一幀中，你只需設置一次相機的資源綁定組，以減少 CPU ~ GPU 的數據傳遞：

const renderPass = commandEncoder.beginRenderPass({/* ... */});

// 只設定一次相機的資源綁定組
renderPass.setBindGroup(0, cameraBindGroup);

for (const pipeline of activePipelines) {
  renderPass.setPipeline(pipeline.gpuRenderPipeline)
  for (const material of pipeline.materials) {
	  // 而對於管線中的材質資源綁定組，就分別設置了
    renderPass.setBindGroup(1, material.gpuBindGroup)
    
    // 此處設置 VBO 併發出繪製指令，略
    for (const mesh of material.meshes) {
      renderPass.setVertexBuffer(0, mesh.gpuVertexBuffer)
      renderPass.draw(mesh.drawCount)
    }
  }
}

renderPass.endPass()

原作附帶信息

作者：Brandon Jones，推特 @Tojiro
原幻燈片：https://docs.google.com/presentation/d/1Q-RCJrZhw9nlZ5py7QxUVgKSyq61awHr2TyIjXxBmI0/edit#slide=id.p
更多額外閱讀：https://toji.github.io/webgpu-best-practices/
一個很棒的原生 WebGPU 教程（英文）：https://alain.xyz/blog/raw-webgpu
對於紋理的對比細節：https://toji.github.io/webgpu-best-practices/img-textures.html
對於緩衝上載的細節：https://toji.github.io/webgpu-best-practices/buffer-uploads.html

WebGPU 的幾個最佳實踐

1 在能用的地方都用 label 屬性

2 使用調試組

3 從 Blob 中載入紋理圖像

更推薦使用壓縮格式的紋理資源

4 使用 glTF 處理庫 gltf-transform

5 緩衝數據上載

6 推薦異步創建 pipeline

7 慎用隱式管線佈局

7 共享資源綁定組與綁定組佈局對象

原作附帶信息

MySQL 分庫分表方案，總結太全了。。

Qt/C++音視頻開發71-指定mjpeg/h264格式採集本地攝像頭/存儲文件到mp4/設備推流/採集推流

WPF開源輕便、快速的桌面啓動器

CesiumJS 源碼雜談 - 從光到 Uniform

教程 - 在 Vue3+Ts 中引入 CesiumJS 的最佳實踐@2023

談談 WMTS 中的 TileMatrix 與 ScaleDenominator

記一次 CesiumJS 中非 4326/3857 WMTS 數據的加載

CesiumJS PrimitiveAPI 高級着色入門 - 從參數化幾何與 Fabric 材質到着色器 - 下篇

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結