惡意代碼定義
惡意代碼也稱爲惡意軟件,是對各種敵對和入侵軟件的概括性術語。包括各種形式的計算機病毒、蠕蟲、特洛伊木馬、勒索軟件、間諜軟件、廣告軟件以及其他的惡意軟件。
惡意代碼的種類
計算機病毒:指寄居在計算機系統中,在一定條件下被執行會破壞系統、程序的功能和數據,影響系統其他程序和自我複製。
蠕蟲:也算是一種病毒,它具有自我複製能力並通過計算和網絡的負載,消耗有限資源。
特洛伊木馬:也可以簡稱爲木馬,最初來源於古希臘傳說。計算機木馬是一種潛伏在計算機中爲了達到某種特殊目的的程序,比如竊取用戶私密信息和控制用戶系統等。它與病毒最大的不同點在於,病毒能進行自我複製,而木馬不具有複製功能,不會感染其他程序。
Rootkit:最初是指一組能幫助使用者獲取系統權限的工具包,這裏的是一種惡意程序,用於獲取目標主機權限之後隱藏攻擊者訪問痕跡,使得攻擊者不被發現從而能夠長期擁有管理員權限。它具有很好的隱蔽性和潛伏性,難以檢測。
惡意代碼特徵(區分程序惡意特徵的特徵信息)
- 系統調用特徵
- 規範化代碼特徵
- N-gram特徵
- 控制流(CFG特徵)
- 指令序列特徵
- 文件格式等特徵
惡意代碼特徵提取
Byte n-gram Features:從文件的二進制代碼中提取Byte n-gram特徵,其中選擇訓練集中每個類的L個最常出現的n克來表示類的配置文件。
Opcode n-gram Features:首先拆卸所有數據集的可執行文件和操作碼提取。一個操作碼的彙編語言指令描述要執行的操作。它是短形式的操作碼。一條指令包含一個操作碼和操作數,選擇應該採取的操作。一些操作的操作數操作碼可能操作,根據CPU體系結構,寄存器,值存儲在內存和堆棧等等。一個操作碼的作用在算術、邏輯運算和數據處理操作。操作碼能夠統計得出之間的可變性惡意和正版軟件。
Portable Executables:這些特徵是從EXE文件的某些部分提取出來的。利用可執行文件的結構信息,通過靜態分析提取可執行文件的特徵。這些有意義的特性表明文件被操縱或感染以執行惡意活動。
String Features:這些特徵是基於純文本編碼在可執行文件,如windows, getversion, getstartupinfo, getmodulefilename, messagebox,庫等。這些字符串是用PE和非PE可執行文件編碼的連續可打印字符。
Function Based Features:在程序文件的運行時行爲上提取基於函數的特徵。基於函數的特性函數駐留在要執行的文件中,並利用它們生成表示文件的各種屬性。
Hybrid Analysis Features:靜態分析和動態分析的結合。
惡意代碼檢測
- 基於靜態特徵的惡意代碼檢測技術
分類特徵 | 參考文獻 |
The byte code | Kolter J Z, Maloof M A. Learning to detect malicious executables in the wild. [C]. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2004: 470-478. Santos I, Penya Y K, Devesa J, et al. N-grams-based File Signatures for Malware Detection. Proceedings of the 2009 International Conference on Enterprise Information Systems (ICEIS), 2009, 9: 317-320 |
n-grams |
|
File format |
Shafiq M Z, Tabish S M, Mirza F, et al. Pe-miner: Mining structural information to detect malicious executables in realtime. Recent advances in intrusion detection, Springer Berlin Heidelberg, 2009: 121-141. Bai J, Wang J, Zou G. A Malware Detection Scheme Based on Mining Format Information. The Scientific World Journal, 2014. |
Gray image | Nataraj L, Karthikeyan S, Jacob G, et al. Malware images: visualization and automatic classification[C] . Proceedings of the 8th international symposium on visualization for cyber security. ACM, 2011: 4. HAN Xiao-guang, QU Wu, YAO Xuan-xia, et al. Research on malicious code variants detection based on texture fingerprint. Journal on Communications, 2014, 35(8):125-135. |
Function call graph | Kong D, Yan G. Discriminant malware distance learning on structural information for automated malware classification[C]. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013: 1357-1365. |
- 基於動態特徵的惡意代碼檢測技術
分類特徵 | 參考文獻 |
Variable length |
Nair V P, Jain H, Golecha Y K, et al. MEDUSA: MEtamorphic malware dynamic analysis Firdausi I, Lim C, Erwin A, et al. Analysis of machine learning techniques used in behavior-based malware detection[C]. Advances in Computing, Control and Telecommunication Technologies (ACT), 2010 Second International Conference on. IEEE, 2010: 201-203. |
API | Nair V P, Jain H, Golecha Y K, et al. MEDUSA: MEtamorphic malware dynamic analysis usingsignature from API[C]. Proceedings of the 3rd International Conference on Security of Information and Networks. ACM, 2010: 263-269. |
subsequences | |
Operation code | Shabtai A, Moskovitch R, Feher C, et al. Detecting unknown malicious code by applying classification techniques on opcode patterns. Security Informatics, 2012, 1(1): 1-22. [17] Pai S, Di Troia F, Visaggio C A, et al. Clustering for malware classification. Journal of Computer Virology and Hacking Techniques, 2016: 1-13. |
n-grams | |
Graph | Bonfante G, Kaczmarek M, Marion J Y. Architecture of a morphological malware detector. Journal in Computer Virology, 2009, 5(3): 263-270. Cesare S, Xiang Y, Zhou W. Control flow-based malware variant detection. IEEE Transactions on Dependable and Secure Computing, 2014, 11(4): 307–317. |
- 基於融合特徵的惡意代碼檢測技術(各種集成特徵類型的檢測方法)
分類特徵(動態特徵/靜態特徵) | 參考文獻 |
Dynamic API operation code |
SantosI, DevesaJ, Brezo F, et al. Opem: A static-dynamic approach for machine learning based malware detection[C]. International Joint Conference CISIS’12-ICEUTE´ 12-SOCO´ 12 Special Sessions. Springer Berlin Heidelberg, 2013: 271-280 |
Program behavior Static DLL、API |
Lu Y B, Din S C, Zheng C F, et al. Using multi-feature and classifier ensembles to improve malware detection. Journal of CCIT, 2010, 39(2): 57-72. |
API call sequence PE format |
Guo S, Yuan Q, Lin F, et al. A malware detection algorithm based on multi-view fusion. Neural Information Processing, Models and Applications, Springer Berlin Heidelberg, 2010: 259-266. Krawczyk B, Woźniak M. Evolutionary Cost-Sensitive Ensemble for Malware |
Dynamic API Static API |
Ozdemir M, Sogukpinar I. An Android Malware Detection Architecture based on Ensemble Learning. Transactions on Machine Learning and Artificial Intelligence, 2014, 2(3): 90-106. |
operation code byte code |
Bai, Jinrong, and Junfeng Wang. Improving malware detection using multiview ensemble learning. Security and Communication Networks 9.17 (2016): 4227-4241. |
參考文獻:
[1] Bo Yun Zhang.Survey on Malicious Code Intelligent Detection Techniques
[2]Smita Ranveer,Swapnaja Hiray.Comparative Analysis of Feature Extraction Methods of Malware Detection