instruction latency 1/throughput
06_2A
VPERM2F128 ymm1, ymm2, ymm3, imm 1 1
VPERMILPD/PS ymm1, ymm2, ymm3 1 1
VSHUFPD/PS ymm1, ymm2, ymm3, imm 1 1
PSHUFB xmm1,xmm2 1 1 1 3 0.5 0.5 1 2
這個不一定,最快 1,0.5 ,2個throuhput
SHUFPD xmm, xmm,imm8,1 1 1 1 1 1 1 1
最快1,1
比整數shuffle慢些
shuffps
SHUFPS xmm, xmm,imm8,1 1 1 2 1 1 1 1
有些機器上比shuffpd 慢些
INSERTPS xmm1, xmm2, imm 1 1 1 1 1 1
EXTRACTPS xmm1, xmm2, imm 3 2 5 1 1 1
VEXTRACTF128 ymm1, ymm2, imm 1 1 avx