ECShop指紋識別+版本判斷

原創

xyw55

2020-07-04 03:53

前些天，寫了一個簡單的ECShop指紋識別程序，做了簡單的版本識別。

具體思路，我是參考了FB上的文章——淺談web指紋識別技術。

由於時間有限，ECShop指紋識別只是從以下三個入手：

1.meta數據元識別

2.intext：powered by ECShop

3.robots.txt

我們打開一個ECShop網站，看看頁面中這幾方面的特徵。

1.我們現在看看meta標籤中有什麼特徵。下面是我截取的一段HTML。

可以看到，這個網站對meta標籤沒有處理，保留了ECShop的原始meta。網站是ECShop及其版本是2.7.2。此處也是做版本識別的地方。

2.再往下查看網頁

我們發現在footer中有Powered by ECShop

可以看到，這個網站對ECShop的footer沒有修改，保留了ECShop的原始的footer，此處我們可以識別ECShop及其版本。由於一般網站修改此處的較多，這裏就不做版本識別了。

3.對robots.txt內容的檢查

robots.txt文件是一個文本文件。robots.txt是一個協議，而不是一個命令。robots.txt是搜索引擎中訪問網站的時候要查看的第一個文件。robots.txt文件告訴蜘蛛程序在服務器上什麼文件是可以被查看的。

當一個搜索蜘蛛訪問一個站點時，它會首先檢查該站點根目錄下是否存在robots.txt，如果存在，搜索機器人就會按照該文件中的內容來確定訪問的範圍；如果該文件不存在，所有的搜索蜘蛛將能夠訪問網站上所有沒有被口令保護的頁面。

那麼這可以被我們利用，以識別ECShop，看下面截圖，我們發現有些文件是ECShop特有的，比如：/affiche.php、/good_script.php、/feed.php。那麼，如果存在這幾個特徵，我們可以基本確定這就是一個ECShop CMS了。

將ECShop指紋單獨保存爲識別字典

ecshop_feature.py

#coding=utf-8
'''
web-fingerprint plugin
1. robots.txt detecting
2. Powered by Ecshop detecting
3.meta
'''
matches = {
	'robots_for_ecshop':
		   ["Disallow: /cert/",
			"Disallow: /templates/",
			"Disallow: /themes/",
			"Disallow: /upgrade/",
			"Disallow: /affiche.php",
			"Disallow: /cycle_image.php",
			"Disallow: /goods_script.php",
			"Disallow: /region.php",
			"Disallow: /feed.php"],
	'intext':['<a href="http://www.ecshop.com" target="_blank" style=" font-family:Verdana; font-size:11px;">Powered by <strong><span style="color: #3366FF">ECShop</span> <span style="color: #FF9966">v2.7.',
			  '<a href="http://www.ecshop.com/license.php?product=ecshop_b2c&url='],
	'meta':['ECSHOP v2.7.3','ECSHOP v2.7.2','ECSHOP v2.7.1','ECSHOP v2.7.0','ECSHOP v2.6.2','ECSHOP'],
	'title':['Powered by ECShop',]
}

下面是識別主程序，輸入回車分割的域名文件

#coding=utf-8
import re
from ecshop_feature import matches
import urllib2

'''
Ecshop 指紋識別
1.meta數據元識別
2.intext識別
3.robots.txt識別
'''
class EcshopDetector():
	'''構造方法，將域名改成URL'''
	def __init__(self,url):
		def handler(signum, frame):    
			raise AssertionError
		if url.startswith("http://"):
			self.url = url
		else:
			self.url = "http://%s" % url
		try: 
			httpres = urllib2.urlopen(self.url, timeout = 5) 
			self.r = httpres
			self.page_content = httpres.read()
		except Exception, e:
			self.r = None
			self.page_content = None

	'''識別meta標籤,版本識別'''
	def meta_detect(self):
		if not self.r:
			return (False,None)
		pattern = re.compile(r'<meta name=".*?" content="(.+)" />')
		infos = pattern.findall(self.page_content)
		if infos:
			for x in infos:
				for i in range(0,5):
					if x == matches['meta'][i]:
						return (True, '%s' %matches['meta'][i])
						break
				if x == matches['meta'][5]:
					return (True,None)
					break
			return (False,None)
		else:
			return (False,None)

	'''ecshop robots.txt,考慮到其他網站也可能用robots.txt中文件名，故必須有兩個以上文件名相同'''
	def robots_ecshop_detect(self):
		if not self.r:
			return False
		robots_url = "%s%s" % (self.url,"/robots.txt")
		try :
			robots_content = requests.get(robots_url,timeout=10).content
		except Exception, e:
			return False
		robots_feature_ecshop = matches['robots_for_ecshop']
		robots_list = robots_content.split("\n")
		count = 0
		for x in robots_feature_ecshop:
			for y in robots_list:
				if(x == y):
					count +=1
		if count >= 2:
			return True
		else:
			# not ecshop
			return False

	'''檢測網頁中的ecshop字樣'''
	def detect_intext(self):
		if not self.r:
			return False
		text_feature = matches['intext'][0] or matches['intext'][1]
		if self.page_content.count(text_feature) != 0:
			return True
		else:
			return False

	'''判別方法'''
	def get_result(self):
		if not self.r:
			return (False,'Not Ecshop!')
		res = self.meta_detect()
		is_meta = res[0]
		version_info = res[1]
		is_ec_robots = self.robots_ecshop_detect()

		is_intext = self.detect_intext()
		if is_meta or is_ec_robots or is_intext:
			# print 'Find Ecshop!'
			if version_info:
				return (True,'%s' % version_info)
			else:
				return (True,'Unknown') 
		else:
			return (False,'Not Ecshop!')

if __name__ == '__main__':
	'''
	ecshop_site.txt是以回車分割的域名文件
	'''
	fobj = open('ecshop_site.txt', 'r')
	fwobj = open('result.txt','a')
	for url in fobj:
		url = url[:-1]
		print url
		ecshopdetector = EcshopDetector(url)	
		ret = ecshopdetector.get_result()
		if ret[0]:
			fwobj.writelines('Site:%s\tVersion:%s\n' % (url,ret[1]))
		else:
			pass
	fobj.close()	
	fwobj.close()

下面是程序得到的部分結果

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

ECShop指紋識別+版本判斷

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

Java ThreadPoolShutdown

“她”來了，陪伴賽道鉅變！爲GPT-4o加上你的一個數字分身

京東秒送售後系統退款業務重構心得| 京東零售技術團隊

齊博CMS變量覆蓋導致sql注入漏洞分析

ECShop指紋識別+版本判斷

graphlab安裝

ubuntu下搭建基於eclipse的c/c++開發環境

vim命令和使用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結