每天学习记录

20200523
如何使用pycharm中的debug模式下的evaluate
2020年5月9日
有关图编辑距离的stackflow的知识点:
networkx如何将GED标准化
如何使用networkx计算多图(俩点之间有多条边的图)的GED
rdkit库的使用文档
如何使用rdkit画图
2020年5月4日
1、写开题报告
2、学习匈牙利算法

2020年4月29日
如何将dataframe中的某列的元素设置为list?
link
如何处理dataframe中的缺失值?
参考资料
2020年4月21日
These debiasing algorithms are very helpful for reducing bias, but are not perfect and do not eliminate all traces of bias. For example, one weakness of this implementation was that the bias direction gg was defined using only the pair of words woman and man. As discussed earlier, if gg were defined by computing g1=ewomanemang_1 = e_{woman} - e_{man}; g2=emotherefatherg_2 = e_{mother} - e_{father}; g3=egirleboyg_3 = e_{girl} - e_{boy}; and so on and averaging over them, you would obtain a better estimate of the “gender” dimension in the 50 dimensional word embedding space. Feel free to play with such variants as well.

The derivation of the linear algebra to do this is a bit more complex. (See Bolukbasi et al., 2016 for details.) But the key equations are:

μ=ew1+ew22(4) \mu = \frac{e_{w1} + e_{w2}}{2}\tag{4}

KaTeX parse error: Expected '}', got '_' at position 34: …mu * \text{bias_̲axis}}{||\text{…

KaTeX parse error: Expected '}', got '_' at position 33: …mu * \text{bias_̲axis}}{||\text{…

μ=μμB(6)\mu_{\perp} = \mu - \mu_{B} \tag{6}

ew1B=1μ22(ew1μ)μB(ew1μ)μB)(7)e_{w1B} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{(e_{\text{w1}} - \mu_{\perp}) - \mu_B} {|(e_{w1} - \mu_{\perp}) - \mu_B)|} \tag{7}

ew2B=1μ22(ew2μ)μB(ew2μ)μB)(8)e_{w2B} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{(e_{\text{w2}} - \mu_{\perp}) - \mu_B} {|(e_{w2} - \mu_{\perp}) - \mu_B)|} \tag{8}

e1=ew1B+μ(9)e_1 = e_{w1B} + \mu_{\perp} \tag{9} e2=ew2B+μ(10)e_2 = e_{w2B} + \mu_{\perp} \tag{10}

Exercise: Implement the function below. Use the equations above to get the final equalized version of the pair of words. Good luck!

ebias_component=egg22g(2)e^{bias\_component} = \frac{e*g}{||g||_2^2} * g\tag{2} edebiased=eebias_component(3)e^{debiased} = e - e^{bias\_component}\tag{3}
If you are an expert in linear algebra, you may recognize ebias_componente^{bias\_component} as the projection of ee onto the direction gg. If you’re not an expert in linear algebra, don’t worry about this.
β\beta
The figure below should help you visualize what neutralizing does. If you’re using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction gg, and the remaining 49 dimensions, which we’ll call gg_{\perp}. In linear algebra, we say that the 49 dimensional gg_{\perp} is perpendicular (or “othogonal”) to gg, meaning it is at 90 degrees to gg. The neutralization step takes a vector such as ereceptioniste_{receptionist} and zeros out the component in the direction of gg, giving us ereceptionistdebiasede_{receptionist}^{debiased}.

Even though gg_{\perp} is 49 dimensional, given the limitations of what we can draw on a screen, we illustrate it using a 1 dimensional axis below.
2020年4月20日

Now, you will consider the cosine similarity of different words with gg. Consider what a positive value of similarity means vs a negative cosine similarity.

Lets first see how the GloVe word embeddings relate to gender. You will first compute a vector g=ewomanemang = e_{woman}-e_{man}, where ewomane_{woman} represents the word vector corresponding to the word woman, and emane_{man} corresponds to the word vector corresponding to the word man. The resulting vector gg roughly encodes the concept of “gender”. (You might get a more accurate representation if you compute g1=emotherefatherg_1 = e_{mother}-e_{father}, g2=egirleboyg_2 = e_{girl}-e_{boy}, etc. and average over them. But just using ewomanemane_{woman}-e_{man} will give good enough results for now.)

In the word analogy task, we complete the sentence “a is to b as c is to ____”. An example is ‘man is to woman as king is to queen’ . In detail, we are trying to find a word d, such that the associated word vectors ea,eb,ec,ede_a, e_b, e_c, e_d are related in the following manner: ebeaedece_b - e_a \approx e_d - e_c. We will measure the similarity between ebeae_b - e_a and edece_d - e_c using cosine similarity.

To measure how similar two words are, we need a way to measure the degree of similarity between two embedding vectors for the two words. Given two vectors uu and vv, cosine similarity is defined as follows:

CosineSimilarity(u, v)=u.vu2v2=cos(θ)(1)\text{CosineSimilarity(u, v)} = \frac {u . v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}

where u.vu.v is the dot product (or inner product) of two vectors, u2||u||_2 is the norm (or length) of the vector uu, and θ\theta is the angle between uu and vv. This similarity depends on the angle between uu and vv. If uu and vv are very similar, their cosine similarity will be close to 1; if they are dissimilar, the cosine similarity will take a smaller value.
The norm of uu is defined as u2=i=1nui2||u||_2 = \sqrt{\sum_{i=1}^{n} u_i^2}

散列表的查找性能分析:

不成功平均查找长度(ASLu)什么意思?

设散列函数h(key)= key mod TableSize,冲突解决方法为线性探测。即d_{i}=.
假设TabelSize = 11,key = 11, 30, 47,则查找33按照线性探测,需要比较3次,因此不成功查找3次。如下图:
在这里插入图片描述

提示报错:
#define MAXSIZETABLE 10000;
这里不要加封号!

这是今天写的一个代码,折腾了好久,在lyh师兄的指点下,顺利编译并且运行了。牢记一点:错误需要一个一个解决!

//几个函数功能:计算素数 初始化哈希表 查找哈希表 插入哈希表 将字符串进行哈希函数的计算,
//参考资料:课件:慕课;书本参考P258
#define MAXSIZETABLE 10000
#define CLOCKS_PER_SEC 1000 //宏定义CLOCKS_PER_SEC为1000,用于将difftime求得的单位从毫秒转化为秒
#include<stdio.h>
#include<time.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>

typedef int ElementType;
typedef int Index;//散列地址类型
typedef struct LNode *PtrToLNode;
 struct LNode{ //定义链表的结构,包含链表元素和位置
	ElementType Data;
	PtrToLNode Next ;
};
 typedef PtrToLNode Position; //Position是指针,指向结构体struct LNode 类型
typedef PtrToLNode List;
typedef struct TblNode *HashTable;
struct TblNode{ //定义哈希表,包含表长和含有链表头的数组
	int TableSize;
	List Heads; //指向链表头结点的数组
};

//返回一个不超过最大表长的最小素数
//输入:一个整形数
//输出:一个素数
int nextPrime(int N){

	int p=(N%2)?(N+2):(N+1);//从大于N的最小奇数开始探测
	int i;

	while(p<MAXSIZETABLE)//超过MAXSIZETABLE时停止比较
	{
		for(i=(int)sqrt(p);i>2;i--)//判断p是否是素数
		{
			if (!(p%i)) break;//如果p被除了1和它本身整除,则不是素数,退出
		}
		if(i==2) break;//for 循环正常结束,说明p是素数
		else p+=2;

	}
	return p;
}

//计算哈希值的函数,哈希值为Key在哈希表中的地址
//输入:Key,TableSize
//输出:一个整形数字
int hash(int Key,int TableSize)
{
	return Key%TableSize;
}

//初始化哈希表
HashTable createTable(int TableSize){
	HashTable H;
	int i;	

	//先分配内存,内存大小为sizeof(struct HashTable)
	H = (HashTable)sizeof(struct TblNode);
		//保证散列表的最大长度是素数
	H->TableSize = nextPrime(TableSize);
	//分配链表头结点数组
	H->Heads=(List)malloc(H->TableSize *sizeof(struct LNode));
	//初始化表头节点
	for(i=0;i<H->TableSize;i++)
	{
		H->Heads[i].Data = '\0';
		H->Heads[i].Next = NULL;
	}
}

//在哈希表H中找Key
Position Find(ElementType Key,HashTable H)
{
	Position p;
	Index Pos;

	//计算key的哈希值
	Pos=hash(Key,H->TableSize);
	
	p= H->Heads[Pos].Next;//从链表的第一个位置开始

	//根据哈希值,在哈希表中查找,找到返回位置,否则返回null
	while(p && strcmp(p->Data,Key))
	{
		p=p->Next;
	}
	return p;//此时P指向找到的结点,或者为NULL
}

//在哈希表中插入Key
void InsertKey(HashTable H,ElementType Key){
	Position NewCell;
	Index Pos;

	//先在哈希表中找Key,调用find(key,H)
	Position P=Find(Key,H);

	//若没有找到,可以插入
	if(!P) 
	{
		//为新插入的节点开辟内存空间,sizeof(struct LNode)
		NewCell = (Position)malloc(sizeof(struct LNode));

		//通过哈希函数计算Key的哈希地址,作为在表中插入的位置
		Pos = hash(Key,H->TableSize);

		//插入这个链表需要插入key,以及next指针,就是插入一个表结点
		NewCell->Next = H->Heads[Pos].Next;
		H->Heads[Pos].Next = NewCell;
	}

	//若找到,无法插入
	else{
		printf("Key已经存在,无法插入");
	}
}

//销毁HashTable
void DestoryTable(HashTable H){
	int i;
	Position p,Temp;

	//删除链表节点的操作:p先指向链表节点,临时指针temp指向p->Next,释放p free(p),P->Temp
	for (i = 0;i<H->TableSize;i++)
	{
		p = H->Heads[i].Next;//先确定要删除的表头,p是表头后面的链条
		while(p)
		{
			Temp = p->Next;
			free(p);
			p=Temp;
		}

	}
	free(H->Heads);
	free(H);
}

//将一个字符串进行哈希函数的计算,
//输入:字符串,例如abcdd 
//输出:字符串的哈希地址
int  HashAdd(const char *key)
{
	//程序计时开始
	clock_t start_t,end_t;
	double total_t;
	

	//字符串的哈希函数,将字符串进行哈希函数
	int TableSize = 110; 
	unsigned int h = 0; //散列函数值,从初始化为0
	int add;
	start_t = clock();

	while(*key!= '\0')//将字符串每个字符*32后相加再相乘,最后得到的这个字符串的哈希值
	{
		h = (h <<5) + *key++; //左移5位

	}
	add = h % TableSize;
	end_t = clock();
	printf("%d\n",add);

	//程序计时结束
	total_t = difftime(end_t,start_t)/CLOCKS_PER_SEC;
	printf("diff:CPU占用的时间:%f\n",total_t);
	
}


//main函数
void main()
{
	printf("hello!");
}

2020年4月8日
关于编辑距离函数的研究:
1.论坛
2.python官网
3.Networkx中论坛,里面有关于图同构的讨论

2020年4月10日
如何解决报错:AttributeError: module ‘tensorflow_core.compat.v1’ has no attribute ‘contrib’
answer
2020年4月13日
python库中GED的包

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章