這個博客荒廢了很久,從當時剛進課題組到現在博都畢業了,讀這幾年書老以自己忙爲藉口,今天才又提筆。
最近新型冠狀肺炎很嚴重,在家沒事就幹活,最近遇到一個問題,寫程序解決了。就在這裏記錄一下。
最近在投文章,但是你懂的,插入了Endnote格式以後,等到最終定稿,你把帶有Endnote鏈接的格式轉換成plain text格式(也就是不會鏈接到Endnote的格式,即無法用Endnote修改了成爲純word文本)發給老師,老師這邊今天補一點明天補一點,甚至直接刪,還不止一個老師,有時根本就來不及在你的原來endnote版本上改完。。。最後 等到你要重投文章改格式或者大批量改文獻時,你就要一!個!一!個!文獻的重新插一遍!!!(想死的心都有。。。)
網上搜索:endnote最終改爲text格式的文檔,還能倒回到endnote嗎?變成可編輯文本後還可以變回endnote格式麼?EndNote plain text可以和EndNote重新建立關聯嗎?所有都回答No。。。
於是,怎麼辦?不想做苦力,能不能用程序幫我們減少工作量?我觀察了一下Endnote插入的最初格式(即Endnote會識別的格式)爲:
#舉例:{Betts, 2018 #83}
#解釋一下,分別是第一作者的姓,文章年份,文章在你本地endnote的Record Number(就文章唯一序號)
那這就簡單了,想辦法拿到這個Record Number序號,生成這個格式,插入到word文中對應位置就可以了。
先說結果,寫了一個簡單的程序,發佈在Github上:https://github.com/YiyanYang0728/Insert_Endnote_format
看一下Readme.md,用法寫的還行,運行get_endnote_insert_fmt.py就行。
也貼出代碼:
# -*- coding: utf-8 -*-
"""
Created on Sun Feb 9 10:36:06 2020
@author: LucasTsubasaYang
"""
"""get Endnote insert format"""
import re
def get_Endnote_insert_format(endnote_file, paper_ref_file, outfile):
"""input files: endnote_file, paper_ref_file
outputfile: outfile
"""
###Read Endnote library references
endnote_ref = {}
f = open(endnote_file, 'r', encoding='utf-8')
items_in_need = ['Author','Year','Title']
for line in f:
if line.startswith('Record Number'):
rec_no = line.strip().split(': ')[1]
# print(rec_no)
endnote_ref[rec_no] = {'Author':'NULL','Year':'NULL','Title':'NULL'}
for key in items_in_need:
if line.startswith(key+': '):
content = line.strip().split(': ',1)[1]
endnote_ref[rec_no][key] = content
f.close()
# Here you can check the information of a given Record Number
# recno = '80'
# print(endnote_ref[recno]['Title'][:30], endnote_ref[recno]['Year'], endnote_ref[recno]['Author'].split(',')[0])
###Read your references list in your plain-text paper
paper_ref = {}
f = open(paper_ref_file, 'r')
for line in f:
No = line.split('\t')[0].split('.')[0]
ref = line.split('\t')[1]
lst = [i.strip() for i in re.split('\(|\)', ref, maxsplit=2)]
author, year, title = lst[:3]
title = title.split('. ')[0]
paper_ref[No] = [author, year, title]
f.close()
###Write Endnote insert format into outfile
g = open(outfile, 'w')
for i in sorted(map(int, paper_ref.keys())):
No = str(i)
print(No)
if No not in paper_ref:
print('This No. not exist!')
continue
#Match to find Endnote Record Number
for rec_no in endnote_ref:
query_title = paper_ref[No][2].lower()[:30]
query_year = paper_ref[No][1]
query_author = ' '.join(re.split('\,|\&', paper_ref[No][0])[0].split()[:-1])
if endnote_ref[rec_no]['Title'][:30].lower() == query_title \
and endnote_ref[rec_no]['Year'] == query_year \
and endnote_ref[rec_no]['Author'].split(',')[0] == query_author:
#FORMAT example: {Betts, 2018 #83}
year = query_year
author_last_name = query_author
print(query_title, query_year, query_author)
insert_format = '{'+author_last_name+', '+year+' #'+rec_no+'}'
print(insert_format)
g.write(No+'. '+'|'.join([query_title, query_year, query_author])+'\n')
g.write(No+'. '+insert_format+'\n')
break
else:
#Not find this paper's record number
print(No, paper_ref[No])
print("Not find this reference's Record Number!")
print(paper_ref[No][2].lower()[:30], paper_ref[No][1], \
paper_ref[No][0].split(',')[0].split()[0])
g.close()
if __name__ == '__main__':
endnote_file = r'Example_input_exported_endnote_style.txt'
paper_ref_file = r'Example_input_paper_reference_list.txt'
outfile = r'Example_output_insert_format.txt'
get_Endnote_insert_format(endnote_file, paper_ref_file, outfile)
下面說下思路,畢竟寫的很naive,只認文章裏類似PNAS的格式。。。如果你是別的格式,你就要在下面部分做改動,讓程序讀出你文章裏每條reference的作者、年份和名字。【這個我之後會盡量完善的。。。】
paper_ref = {}
f = open(paper_ref_file, 'r')
for line in f:
No = line.split('\t')[0].split('.')[0]
ref = line.split('\t')[1]
lst = [i.strip() for i in re.split('\(|\)', ref, maxsplit=2)]
author, year, title = lst[:3]
title = title.split('. ')[0]
paper_ref[No] = [author, year, title]
f.close()
我們人工重新插入文章,就同時看三件事,1. 文章名字; 2. 作者(一作就行);3. 年份
這三個都一樣的話,99%是同一篇文獻了。
首先我們拿到輸入文件,來自Endnote你本地的庫,操作打開Endnote X9(我用X9),Open Endnote->Select all the references in your endnote (選中你的所有文獻,太多的話就你需要插入的所有文獻,選中爲藍色)->File->Export->Save type: Text file 且 Output style: show all fields->Save,得到一個txt,叫A,裏面這樣
Reference Type: Book
Record Number: 109
Author: Darwin, Charles
Year: 1888
Title: On the origin of species by means of natural selection: or the preservation of favored races in the struggle for life
Publisher: D. Appleton
Volume: 2
Short Title: On the origin of species by means of natural selection: or the preservation of favored races in the struggle for life
這裏就有Record Number
然後你把你word底部的文章列表粘貼出來得到另一個txt,叫B。我這裏長這樣:
1. Karner MB, Delong EF, & Karl DM (2001) Archaeal dominance in the mesopelagic zone of the Pacific Ocean. Nature 409(6819):507-510.
2. DeLong EF (1992) Archaea in coastal marine environments. Proc Natl Acad Sci U S A 89(12):5685-5689.
你把A和B裏面的文章作者、年份、標題程序比對,從A中查到Record Number再改造下就得到了Endnote格式,多Easy!!!
結果如下:
1. archaeal dominance in the meso|2001|Karner【這是給你check用的】
1. {Karner, 2001 #26}【這是可以用的Endnote格式】
2. archaea in coastal marine envi|1992|DeLong
2. {DeLong, 1992 #15}
後一步就比較傻了,你進word根據文獻序號一個一個看上面的表複製粘貼到word對應處就行。【這一步是太傻了,關於這點我也在改進,看到python-docx可以直接編輯word,之後會想要改進這個功能,爭取可以直接在word中修改】
最後爲了讓文獻生效記得使用Endnote的Update Citations&Bibliography功能。