原创 permute 指令
AVX 1. VPERMILPD — Permute Double-Precision Floating-Point Values VPERMILPD xmm1, xmm2,xmm3/m128 Permute double-prec
原创 cpu signature
In fact ,it is the concatenation of eax[19,16] eax[15,8] eax[7,4] eax[19
原创 amplxe-cl -finalize:用的不多.
-I, -finalize Re-finalize the result. Re-start postproces
原创 Linux 查看硬件信息
系統 # uname -a # 查看內核/操作系統/CPU信息 # head -n 1 /etc/issue # 查看操作系統版本 # cat /proc/cpuinfo # 查看CPU信息 #
原创 計算機進階 讀書目錄
From the Bottom Up This list is in the best reading order I could find. It’s not necessarily easiest to hardest, but b
原创 Event Configuration from the Command Line
Submit New Article June 7, 2011 9:00 PM PDT With the release of the VTune™ Amplifier XE 2011 Update 3, you
原创 整數 shuffle
in place shuffle PSHUFB (with 64 bit operands) PSHUFB mm1, mm2/m64 for i = 0 to 7 { if (SRC[(i * 8)+7] = 1 ) then DES
原创 Vtune how to use?
Usage: amplxe-cl <-action> [-action-option] [-global-option] [[--] target
原创 cache block 矩陣矩陣乘
1. 請編寫兩個1024*1024大小的矩陣(矩陣元素爲double類型)相乘的程序。要求採用分塊相乘的技術以提高效率,並分析分塊方式和Cache容量之間的關係。 解:程序如下: // 由於參數發生了修改,請修正答案。 #includ
原创 Jobs letter
I have always said if there ever came a day when I could no longer meet my
原创 amplxe-cl -help command
Intel(R) VTune(TM) Amplifier XE 2011 Update 7 (build 206420) Command Line
原创 Vtune: amplxe-cl 命令行使用
參考文獻 點擊打開鏈接 http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/ug_docs/index.h
原创 浮點shuffle
1. SHUFPD—Shuffle Packed Double-Precision Floating-Point Values SHUFPD xmm1, xmm2/m128, imm8 IF IMM0[0] = 0 THEN DEST[6
原创 amplxe-cl -help collect-with
Intel(R) VTune(TM) Amplifier XE 2011 Update 7 (build 206420) Command Line
原创 Shuffle Latency and throughput 對比
instruction latency 1/throughput 06_2A VPERM2F128 ymm1, ymm2, ymm3, i