glibc memcpy函数的一些研究

原創

2020-06-21 23:07

在测试内存（AEP，6*256GB interleaved dax）性能的时候，发现通过8B循环写的带宽大概是4GB/s，然后无意间用了一下memcpy，发现带宽达到了10GB/s，就顺便研究了一下memcpy函数，做个记录如下：
glibc的memcpy函数实现如下

void *
memcpy (void *dstpp, const void *srcpp, size_t len)
{
  unsigned long int dstp = (long int) dstpp;
  unsigned long int srcp = (long int) srcpp;

  /* Copy from the beginning to the end.  */

  /* If there not too few bytes to copy, use word copy.  */
  if (len >= OP_T_THRES)
    {
      /* Copy just a few bytes to make DSTP aligned.  */
      len -= (-dstp) % OPSIZ;
      BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ);

      /* Copy whole pages from SRCP to DSTP by virtual address manipulation,
	 as much as possible.  */

      PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len);

      /* Copy from SRCP to DSTP taking advantage of the known alignment of
	 DSTP.  Number of bytes remaining is put in the third argument,
	 i.e. in LEN.  This number may vary from machine to machine.  */

      WORD_COPY_FWD (dstp, srcp, len, len);

      /* Fall out and copy the tail.  */
    }

  /* There are just a few bytes to copy.  Use byte memory operations.  */
  BYTE_COPY_FWD (dstp, srcp, len);

  return dstpp;
}

主要是在拷贝大块的时候会使用PAGE_COPY_FWD_MAYBE，定义如下

# define PAGE_COPY_FWD_MAYBE(dstp, srcp, nbytes_left, nbytes)		      \
  do									      \
    {									      \
      if ((nbytes) >= PAGE_COPY_THRESHOLD &&				      \
	  PAGE_OFFSET ((dstp) - (srcp)) == 0) 				      \
	{								      \
	  /* The amount to copy is past the threshold for copying	      \
	     pages virtually with kernel VM operations, and the		      \
	     source and destination addresses have the same alignment.  */    \
	  size_t nbytes_before = PAGE_OFFSET (-(dstp));			      \
	  if (nbytes_before != 0)					      \
	    {								      \
	      /* First copy the words before the first page boundary.  */     \
	      WORD_COPY_FWD (dstp, srcp, nbytes_left, nbytes_before);	      \
	      assert (nbytes_left == 0);				      \
	      nbytes -= nbytes_before;					      \
	    }								      \
	  PAGE_COPY_FWD (dstp, srcp, nbytes_left, nbytes);		      \
	}								      \
    } while (0)

再到PAGE_COPY_FWD函数

#include <mach.h>

/* Threshold at which vm_copy is more efficient than well-optimized copying
   by words.  */
#define PAGE_COPY_THRESHOLD		(16384)

#define PAGE_SIZE		__vm_page_size
#define PAGE_COPY_FWD(dstp, srcp, nbytes_left, nbytes)			      \
  ((nbytes_left) = ((nbytes) -						      \
		    (__vm_copy (__mach_task_self (),			      \
				(vm_address_t) srcp, trunc_page (nbytes),     \
				(vm_address_t) dstp) == KERN_SUCCESS	      \
		     ? trunc_page (nbytes)				      \
		     : 0)))

可以看到使用的是__vm_copy，对它的解释参考这里

movq src, dst (源地址, 目的地址)
~~虽然也是8B的拷贝但是可能是预取使得性能提上去了~~ 。（好像也不太对，预取的是源地址的东西，但是我用的是立即数/常数；循环（分支预测）的问题？我按照这里的8个一组循环测试，也只有3-4GB/s的带宽；那不然难道是memcpy没有真正的复制？？？难顶…）

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

glibc memcpy函数的一些研究

Power Automate Desktop 安装完，登录后老是提示one driver 错误

再谈23种设计模式（3）：行为型模式（学习笔记）

微前端学习笔记(4):从微前端到微模块之EMP与hel-micro方案探索

微前端学习笔记（1）：微前端总体架构概述，从微服务发微

985 硕士程序员，空窗 4 个月没有 Offer！

一文搞懂 Spring 循环依赖

赛博斗地主——使用大语言模型扮演Agent智能体玩牌类游戏。

VScode右键打开(添加到右键)

记一次 .NET某工控视觉自动化系统卡死分析

WindowsServer--SQL Server搭建主从同步实现读写分离 - 事务性分发

python正則如何判斷一個字符串中是否只有某些字符

python獲取網頁amf的信息

最長公共子序列LCS和最長子串SLCS

ssh 反向代理連接內網服務器並配置開機自啓動(解決autossh無法開機自啓動)

glibc memcpy函數的一些研究

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結