DrugBank數據庫Downloads詳解(版本5.1.4,2019-7-2)

Drugbank開放數據集是公共域數據集,可以在您的應用程序或項目中自由使用(包括商業用途)。它是根據Creative Common的CC0國際許可證發佈的。

在法律允許的範圍內,將CC0與藥房銀行公開數據關聯的人放棄了對藥房銀行公開數據的所有版權和相關或相鄰權利。發表於:加拿大。

https://www.drugbank.ca/


目錄

1. Drug Sequences(以Approved爲例)

2. Protein identifiers(Approved)

3. target sequences(Approved)

4. External Links → External Drug Links(Approved)

5. External Links → Target Drug-UniProt Links(Approved)

6. External Links → Enzyme/Carrier/Transporter Drug-UniProt Links(Approved)

7. StructuresStructure External Links (Approved)

8. Complete Database(Full)


1. Drug Sequences(以Approved爲例)

drugbank_approved_drug_sequences.fasta.zip

按下圖操作,並下載

下載文件如下所示: 

drugbank_drug|DB00002 Cetuximab heavy chain爲例:https://www.drugbank.ca/drugs/DB00002

可以發現這是一個被批准的藥物(Approved drug),是蛋白質類型。

因此,Drug Sequences是蛋白質類藥物


2. Protein identifiers(Approved)

Protein identifiers include external IDs to resources such as UniProt and PDB. These downloads are divided first by protein/compound type (target, transporter, etc.). Secondly they are divided by drug group (approved, illicit, etc.). Each archive contains 2 files: one for all target/enzyme/transporter/carriers and one with only those marked as pharmacologically active (directly related to the mechanism of action for at least one of the associated drugs). Note that each row in the export CSV file also includes a concatenated list of DrugBank drugs IDs (semi-colon delimited) as the last column.

蛋白質標識符包括uniprot和pdb等資源的外部id。這些下載首先按蛋白質/化合物類型(目標、轉運體等)劃分。其次,它們按藥物類別(批准的、非法的等)劃分。每個檔案包含2個文件:一個爲所有目標/酶/轉運蛋白/載體和一個只有那些標記爲藥理活性(直接相關的作用機制,至少一個相關的藥物)。請注意,export csv文件中的每一行還包括一個串聯的藥庫藥品id列表(以分號分隔)作爲最後一列。

drugbank_approved_target_polypeptide_ids.csv.zip

all.csv, pharmacologically_active.csv

可以發現:左邊比右邊多了1000+條data(注意:不是全部的蛋白質數據,應該是有相應的drug的)。

4,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,1EVU; 1EX0; 1F13; 1FIE; 1GGT; 1GGU; 1GGY; 1QRK; 4KTY,,F13A1,HGNC:3531,Humans,DB11300; DB11311; DB11571; DB11572; DB13151

 再以其相應的Drug進行搜索,以DB11300爲例:https://www.drugbank.ca/drugs/DB11300#targets

匹配成功!

2,Histidine decarboxylase,HDC,32109,X54297,P19113,DCHS_HUMAN,4E1O,,HDC,HGNC:4855,Humans,DB00114; DB00117

 

匹配成功!說明All.csv文件儲存的是針對有Drug Relations項的所有Proteins。

  • 然而,需要注意的是,可能並不完整。因爲,對於ID = 4號,All.csv顯示如下

4,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,1EVU; 1EX0; 1F13; 1FIE; 1GGT; 1GGU; 1GGY; 1QRK; 4KTY,,F13A1,HGNC:3531,Humans,DB01839; DB11300; DB11311; DB11571; DB11572; DB13151 

並未將https://www.drugbank.ca/bio_entities/BE0000004中的Drug Relations全部包含進去,缺少了如下兩項: (原因未知)

從上述描述和文件名可以得出:

pharmacologically_active.csv文件包含的Drug IDs是如下圖所示的。而All.csv應該是包含yes & unknown的,但是尚不完全。


3. target sequences(Approved)

drugbank_approved_target_polypeptide_sequences.fasta.zip

protein.fasta, gene.fasta

  •  分別是Amino acid sequenceGene sequence
  • P19113爲例,直接檢索

 進入後,頁面如下

結果與文件中的標題行一致,標題行爲:

 >drugbank_target|P19113 Histidine decarboxylase (DB00114; DB00117)

  •  DB是相關聯的Drug

值得注意的是:以下兩個文件是一一對應的。

ZIP drugbank_approved_target_polypeptide_ids.csv drugbank_approved_target_polypeptide_sequences.fasta
file all.csv protein.fasta
ID

DrugBank的ID

如:https://www.drugbank.ca/bio_entities/BE0000002

UniProt的ID

4. External Links → External Drug Links(Approved)

drugbank_approved_drug_links.csv.zip

drug links.csv 

  • 包含3883個Drug
  • 包含如下內容:

DrugBank ID , Name , CAS Number , Drug Type , KEGG Compound ID , KEGG Drug ID , PubChem Compound ID ,

PubChem Substance ID , ChEBI ID , PharmGKB ID , HET ID , UniProt ID , UniProt Title , GenBank ID , DPD ID ,

RxList Link , Pdrhealth Link , Wikipedia ID , Drugs.com Link , NDC ID , ChemSpider ID , BindingDB ID , TTD ID


5. External Links → Target Drug-UniProt Links(Approved)

drugbank_approved_target_uniprot_links.csv.zip

uniprot links.csv

文件中相對於DB00002有12行,說明該藥有12個Targets(並提供了其Uniprot ID)。與上圖中顯示的Targets(12)一致。 

DB00002,Cetuximab,BiotechDrug,P00533,Epidermal growth factor receptor
DB00002,Cetuximab,BiotechDrug,O75015,Low affinity immunoglobulin gamma Fc region receptor III-B
DB00002,Cetuximab,BiotechDrug,P00736,Complement C1r subcomponent
DB00002,Cetuximab,BiotechDrug,P02745,Complement C1q subcomponent subunit A
DB00002,Cetuximab,BiotechDrug,P02746,Complement C1q subcomponent subunit B
DB00002,Cetuximab,BiotechDrug,P02747,Complement C1q subcomponent subunit C
DB00002,Cetuximab,BiotechDrug,P08637,Low affinity immunoglobulin gamma Fc region receptor III-A
DB00002,Cetuximab,BiotechDrug,P09871,Complement C1s subcomponent
DB00002,Cetuximab,BiotechDrug,P12314,High affinity immunoglobulin gamma Fc receptor I
DB00002,Cetuximab,BiotechDrug,P12318,Low affinity immunoglobulin gamma Fc region receptor II-a
DB00002,Cetuximab,BiotechDrug,P31994,Low affinity immunoglobulin gamma Fc region receptor II-b
DB00002,Cetuximab,BiotechDrug,P31995,Low affinity immunoglobulin gamma Fc region receptor II-c

  • 前三列DrugBank ID, Name, Type爲Drug信息
  • 後兩列UniProt ID, UniProt Name爲Target信息 

6. External Links → Enzyme/Carrier/Transporter Drug-UniProt Links(Approved)

drugbank_approved_enzyme/c*/t*_uniprot_links.csv.zip

uniprot links.csv

在Enzyme文件中:DB00006,Bivalirudin,SmallMoleculeDrug,P05164,Myeloperoxidase

在Target文件中:DB00006,Bivalirudin,SmallMoleculeDrug,P00734,Prothrombin

因此,Target和 Enzyme/Carrier/Transporter分別是不同的東西。(只關注Target即可?)


7. Structures→ Structure External Links (Approved)

drugbank_approved_structure_links.csv.zip

structure links.csv

  • 2594條data
  • 包含如下內容:

DrugBank ID , Name , CAS Number , Drug Groups , InChIKey , InChI , SMILES , Formula ,

KEGG Compound ID , KEGG Drug ID , PubChem Compound ID , PubChem Substance ID ,

ChEBI ID , ChEMBL ID , HET ID , ChemSpider ID , BindingDB ID
 


8. Complete Database(Full)

drugbank_all_full_database.xml.zip

full database.xml


其他可參考文章:

Drug-Target Interaction 預測中的幾個數據庫(轉載)


注意:biointeractions爲藥物-藥物相互作用

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章