原创 permute 指令

AVX 1. VPERMILPD — Permute Double-Precision Floating-Point Values VPERMILPD xmm1, xmm2,xmm3/m128 Permute double-prec

原创 cpu signature

In fact ,it is the concatenation of eax[19,16] eax[15,8] eax[7,4] eax[19

原创 amplxe-cl -finalize:用的不多.

-I, -finalize                 Re-finalize the result.  Re-start postproces

原创 Linux 查看硬件信息

系統 # uname -a               # 查看內核/操作系統/CPU信息 # head -n 1 /etc/issue   # 查看操作系統版本  # cat /proc/cpuinfo      # 查看CPU信息 #

原创 計算機進階 讀書目錄

From the Bottom Up This list is in the best reading order I could find. It’s not necessarily easiest to hardest, but b

原创 Event Configuration from the Command Line

Submit New Article   June 7, 2011 9:00 PM PDT   With the release of the VTune™ Amplifier XE 2011 Update 3, you

原创 整數 shuffle

in place shuffle PSHUFB (with 64 bit operands) PSHUFB mm1, mm2/m64   for i = 0 to 7 { if (SRC[(i * 8)+7] = 1 ) then DES

原创 Vtune how to use?

 Usage: amplxe-cl <-action> [-action-option] [-global-option] [[--] target

原创 cache block 矩陣矩陣乘

1.   請編寫兩個1024*1024大小的矩陣(矩陣元素爲double類型)相乘的程序。要求採用分塊相乘的技術以提高效率,並分析分塊方式和Cache容量之間的關係。 解:程序如下: // 由於參數發生了修改,請修正答案。 #includ

原创 Jobs letter

I have always said if there ever came a day when I could no longer meet my

原创 amplxe-cl -help command

Intel(R) VTune(TM) Amplifier XE 2011 Update 7 (build 206420) Command Line

原创 Vtune: amplxe-cl 命令行使用

參考文獻 點擊打開鏈接 http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/ug_docs/index.h

原创 浮點shuffle

1. SHUFPD—Shuffle Packed Double-Precision Floating-Point Values SHUFPD xmm1, xmm2/m128, imm8 IF IMM0[0] = 0 THEN DEST[6

原创 amplxe-cl -help collect-with

Intel(R) VTune(TM) Amplifier XE 2011 Update 7 (build 206420) Command Line

原创 Shuffle Latency and throughput 對比

instruction                                                latency  1/throughput   06_2A VPERM2F128 ymm1, ymm2, ymm3, i