醫療術語自動編碼論文總結（Automated Coding）

1996

Larkey, Leah S., and W. Bruce Croft. "Combining classifiers in text categorization." SIGIR. Vol. 96. 1996.

思想：

看成檢索問題，輸入是discharge summary長文本，輸出是每個code的分數。打分的方法有三個，K-nearest-neighbor, relevance feedback, and Bayesian independence classifers，其中貝葉斯訓練了1068個分類器，每個分類器選取top40 terms作爲特徵訓練。最終將三個得分結合起來。

2000

Franz, Pius, et al. "Automated coding of diagnoses--three methods compared." Proceedings of the AMIA Symposium. American Medical Informatics Association, 2000.

論文鏈接

被引用次數：39

思想：

Both techniques produced a ranked output ofpossible diagnoses within a vector space frameworkfor retrieval.

構建一個檢索框架，輸入是query，與ICD list中一一比較，然後rank，返回第一個。

具體方法將query和document轉換成vector，使用ngram, stem, prefix, suffix,等特徵，另外還引入SNOMED數據作爲中間映射結果，讓再映射到ICD。

2007

1. Pestian, John P., et al. "A shared task involving multi-label classification of clinical free text." Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing. Association for Computational Linguistics, 2007.

論文鏈接

被引用次數：308

思想：

發佈了ICD-9實體鏈接任務

2008

Farkas, Richárd, and György Szarvas. "Automatic construction of rule-based ICD-9-CM coding systems." BMC bioinformatics. Vol. 9. No. 3. BioMed Central, 2008.

論文鏈接

被引用次數：102

思想：使用rule從EMR中提取一些短文本，作爲分類器輸入。然後用decision tree 或 max entropy訓練分類器。最終得到分類器中的規則。所以這篇paper的主要目的是自動挖掘ICD編碼的規則。

2012

Kang, Ning, et al. "Using rule-based natural language processing to improve disease normalization in biomedical text." Journal of the American Medical Informatics Association20.5 (2012): 876-881.

論文鏈接

被引用次數：58

思想：

使用AZDC數據集，其中有標好的UMLS編碼。基本方法是基於已有的Concept normalization systems（MetaMap and Peregrine）上進行優化，優化的方法是rule-based模型。

2013

1. Kavuluru, Ramakanth, Sifei Han, and Daniel Harris. "Unsupervised extraction of diagnosis codes from EMRs using knowledge-based and extractive text summarization techniques." Canadian conference on artificial intelligence. Springer, Berlin, Heidelberg, 2013.

論文鏈接

被引用次數：8

思想：

無監督方法。1）NER (MetaMap) 識別EMR中診斷術語，使用UMLS Metathesaurus映射到ICD code；2）使用UMLS relationship graph 解決第一步中未映射的術語 3）關鍵詞抽取技術（C-value）對第二步的結果進行排序
這個ensemble approach怎麼說呢，像一鍋亂燉，開源的工具咔咔懟到一起。。。比較依賴於工具的性能，第一步NER有問題的話結果肯定不行了

2. Perotte, Adler, et al. "Diagnosis code assignment: models and evaluation metrics." Journal of the American Medical Informatics Association 21.2 (2013): 231-237.

論文鏈接

被引用次數：79

思想：

使用Mimic2的Discharge summaries作爲訓練數據，對ICD-9自動編碼。提出兩個方法flat SVM 和 hierarchy-based SVM,分類器的特徵是tf-idf得到的關鍵詞

flat SVM：對每個code訓練一個SVM，將輸出爲1的code合併得到輸出

hierarchy-based SVM：將ICD-9的層次結構考慮進來，只有當父節點的code爲positive，才運行子code的分類器

2014

Leaman, Robert, and Zhiyong Lu. "Automated disease normalization with low rank approximations." Proceedings of BioNLP 2014 (2014): 24-28.

論文鏈接

被引用次數：7

思想：

pairwise learning to rank，將NCBI Disease Corpus和其中的concept通過TF-IDF向量化，再設計一個score()函數對a pair of text 進行打分。訓練打分函數其中的參數。缺點是低效，不能應用於大規模數據，而且對語義關係挖掘的還不夠。

2015

1. Kavuluru, Ramakanth, Anthony Rios, and Yuan Lu. "An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records." Artificial intelligence in medicine 65.2 (2015): 155-166.

論文鏈接

被引用次數：45

思想：

將醫療實體鏈接轉換成多個二分類的問題，每個code訓練一個分類器，然後對輸出的code進行rank，再使用label calibration methods預測label的數量，對結果進行選取topk個code作爲最終的assigned code

2. Koopman, Bevan, et al. "Automatic ICD-10 classification of cancers from free-text death certificates." International journal of medical informatics 84.11 (2015): 956-965.

論文鏈接

被引用次數：47

思想：

從death certificates提取terms, n-grams and SNOMED CT concepts等特徵訓練兩個SVM分類器，第一個判斷是否有癌症，第二個判斷術語那種類型癌症

2016

Wang, Sen, et al. "Diagnosis code assignment using sparsity-based disease correlation embedding." IEEE Transactions on Knowledge and Data Engineering 28.12 (2016): 3191-3202.

論文鏈接

被引用次數：43

思想：

2017

Shi, Haoran, et al. "Towards automated icd coding using deep learning." arXiv preprint arXiv:1711.04075 (2017).

論文鏈接

被引用次數：18

思想：

使用MIMIC-III數據集，從中提取出diagnosis descriptions進行ICD編碼映射。

主要方法：RNN對document和ICD titles分別編碼，然後使用attention選擇出diagnosis descriptions進行下一步，在這裏需要對document中的每一個diagnosis descriptions和所有的ICD titles進行比對。最終使用sigmoid激活函數二分類。

2018

1. Duarte, Francisco, et al. "Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text." Journal of biomedical informatics 80 (2018): 64-77.

論文鏈接

被引用次數：8

思想：

使用多種醫療數據做死亡原因ICD編碼。創新點是提出了一種神經網絡結構預測chapters, blocks, and full-codes三種類型ICD編碼。具體是使用RNN編碼不同源數據，進行merge，然後分別訓練三個模型預測chapters, blocks, and full-codes，其中前兩類是多分類問題，最後一層使用softmax;第三種是二分類，最後一層使用的sigmoid。比較新穎的是在最後網絡層使用標籤之間的共現關係初始化參數。

2. Mullenbach, James, et al. "Explainable prediction of medical codes from clinical text." arXiv preprint arXiv:1802.05695(2018).

論文鏈接

被引用次數：24

思想：

對discharge summaries自動編碼到ICD-9，是一個multilabel text classification任務。主要方法是使用CNN對document進行編碼，然後attention出來根據不同的label選擇document不同的部分作爲最終的輸出進行預測。

3. Xie, Pengtao, and Eric Xing. "A neural architecture for automated icd coding." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018.

論文鏈接

被引用次數：4

思想：

使用MIMIC-III dataset的discharge diagnosis。利用tree-of-sequences LSTM進行編碼，adversarial learning進行預測的提升。

4. Baumel, Tal, et al. "Multi-label classification of patient notes: case study on ICD code assignment." Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence. 2018.

被引用次數：23

思想：

提出了HA-GRU的方法。是一個層次GRU方法，第一層GRU編碼word, 第二層GRU編碼sentence。sentence attention得到每個word的權重，label attention得到sentence的權重，然後通過隱含層+softmax得到label分類。

醫療術語自動編碼論文總結（Automated Coding）

1996

2000

2007

2008

2012

2013

2014

2015

2016

2017

2018

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

醫療術語自動編碼論文總結（Automated Coding）

醫療領域實體對齊（實體鏈接）論文總結

關係抽取論文總結（relation extraction）不斷更新

論文筆記 Medical Entity Linking using Triplet Network

使用Keras計算餘弦相似度（Cosine Similarity）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

醫療術語自動編碼論文總結 （Automated Coding）

1996

2000

2007

2008

2012

2013

2014

2015

2016

2017

2018

醫療術語自動編碼論文總結（Automated Coding）