Python爬蟲實現破解58同城加密內容

在爬取58同城租房信息的聯繫號碼時,發現抓取的‘13823661900’對應的內容是‘龒鑶龤驋鑶餼餼龒鵂閏閏’

看起來應該是字體加密,字體加密一般是網頁修改了默認的字符編碼集,在網頁上加載的網頁定義的字體文件作爲字體的樣式,可以正確地顯示數字,但是在源碼上同樣的二進制數由於未加載自定義的字體文件就由計算機默認編碼成了亂碼。

解決辦法:找到字體文件,分析文件中的映射關係,一般字體文件都是作爲樣式加在加密字體的部位。

 

從上圖可以看到,取消勾選fangchan-secret樣式後,部分加密的內容顯示成了我們實際爬取的內容,所以這個fangchan-secret最可能是字體加密文件,接下來在網頁源代碼中搜索‘fangchan-secret’尋找字體加密文件,如下圖所示:在下圖的源碼中,字體文件是通過base64加密之後放在js裏面了,把其中加密的部分(兩條紅豎線內的內容)取出,可使用正則將其中的內容取出來(或者直接拷貝下來)。每次網頁刷新,字體加密文件中的映射順序可能會變,所以最好觀察下映射順序是否變化。

以下是代碼:

import base64
from io import BytesIO
from fontTools.ttLib import TTFont
import requests
import re

# 正則獲取字體文件內容
url = 'https://sz.58.com/zufang/37600131385868x.shtml?entinfo=37600131385868_0&fzbref=0&params=rankbusitimeZZ8pc0020^desc&psid=125027535203666225060654534&iuType=gz_2&ClickID=1&cookie=|||c5/nn1pMpWpci9dLRdczAg==&PGTID=0d300008-00a7-53b9-8bbc-17b565fb4b3a&apptype=0&key=&pubid=66881617&trackkey=37600131385868_c528f692-5ebb-402e-99ff-dce613171bf0_20190329165957_1553849997376&fcinfotype=gz'
res = requests.get(url)
bs64Str = re.findall("charset=utf-8;base64,(.*?)'\)", res.text)[0]
# 直接從網頁源代碼拷貝字體文件內容
base64Str = 'AAEAAAALAIAAAwAwR1NVQiCLJXoAAAE4AAAAVE9TLzL4XQjtAAABjAAAAFZjbWFwq8J/ZQAAAhAAAAIuZ2x5ZuWIN0cAAARYAAADdGhlYWQVGM29AAAA4AAAADZoaGVhCtADIwAAALwAAAAkaG10eC7qAAAAAAHkAAAALGxvY2ED7gSyAAAEQAAAABhtYXhwARgANgAAARgAAAAgbmFtZTd6VP8AAAfMAAACanBvc3QFRAYqAAAKOAAAAEUAAQAABmb+ZgAABLEAAAAABGgAAQAAAAAAAAAAAAAAAAAAAAsAAQAAAAEAAOiv/hBfDzz1AAsIAAAAAADYyMFWAAAAANjIwVYAAP/mBGgGLgAAAAgAAgAAAAAAAAABAAAACwAqAAMAAAAAAAIAAAAKAAoAAAD/AAAAAAAAAAEAAAAKADAAPgACREZMVAAObGF0bgAaAAQAAAAAAAAAAQAAAAQAAAAAAAAAAQAAAAFsaWdhAAgAAAABAAAAAQAEAAQAAAABAAgAAQAGAAAAAQAAAAEERAGQAAUAAAUTBZkAAAEeBRMFmQAAA9cAZAIQAAACAAUDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBmRWQAQJR2n6UGZv5mALgGZgGaAAAAAQAAAAAAAAAAAAAEsQAABLEAAASxAAAEsQAABLEAAASxAAAEsQAABLEAAASxAAAEsQAAAAAABQAAAAMAAAAsAAAABAAAAaYAAQAAAAAAoAADAAEAAAAsAAMACgAAAaYABAB0AAAAFAAQAAMABJR2lY+ZPJpLnjqeo59kn5Kfpf//AACUdpWPmTyaS546nqOfZJ+Sn6T//wAAAAAAAAAAAAAAAAAAAAAAAAABABQAFAAUABQAFAAUABQAFAAUAAAABwAGAAUAAwAJAAIACAAEAAEACgAAAQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAAAAAAAiAAAAAAAAAAKAACUdgAAlHYAAAAHAACVjwAAlY8AAAAGAACZPAAAmTwAAAAFAACaSwAAmksAAAADAACeOgAAnjoAAAAJAACeowAAnqMAAAACAACfZAAAn2QAAAAIAACfkgAAn5IAAAAEAACfpAAAn6QAAAABAACfpQAAn6UAAAAKAAAAAAAAACgAPgBmAJoAvgDoASQBOAF+AboAAgAA/+YEWQYnAAoAEgAAExAAISAREAAjIgATECEgERAhIFsBEAECAez+6/rs/v3IATkBNP7S/sEC6AGaAaX85v54/mEBigGB/ZcCcwKJAAABAAAAAAQ1Bi4ACQAAKQE1IREFNSURIQQ1/IgBW/6cAicBWqkEmGe0oPp7AAEAAAAABCYGJwAXAAApATUBPgE1NCYjIgc1NjMyFhUUAgcBFSEEGPxSAcK6fpSMz7y389Hym9j+nwLGqgHButl0hI2wx43iv5D+69b+pwQAAQAA/+YEGQYnACEAABMWMzI2NRAhIzUzIBE0ISIHNTYzMhYVEAUVHgEVFAAjIiePn8igu/5bgXsBdf7jo5CYy8bw/sqow/7T+tyHAQN7nYQBJqIBFP9uuVjPpf7QVwQSyZbR/wBSAAACAAAAAARoBg0ACgASAAABIxEjESE1ATMRMyERNDcjBgcBBGjGvv0uAq3jxv58BAQOLf4zAZL+bgGSfwP8/CACiUVaJlH9TwABAAD/5gQhBg0AGAAANxYzMjYQJiMiBxEhFSERNjMyBBUUACEiJ7GcqaDEx71bmgL6/bxXLPUBEv7a/v3Zbu5mswEppA4DE63+SgX42uH+6kAAAAACAAD/5gRbBicAFgAiAAABJiMiAgMzNjMyEhUUACMiABEQACEyFwEUFjMyNjU0JiMiBgP6eYTJ9AIFbvHJ8P7r1+z+8wFhASClXv1Qo4eAoJeLhKQFRj7+ov7R1f762eP+3AFxAVMBmgHjLfwBmdq8lKCytAAAAAABAAAAAARNBg0ABgAACQEjASE1IQRN/aLLAkD8+gPvBcn6NwVgrQAAAwAA/+YESgYnABUAHwApAAABJDU0JDMyFhUQBRUEERQEIyIkNRAlATQmIyIGFRQXNgEEFRQWMzI2NTQBtv7rAQTKufD+3wFT/un6zf7+AUwBnIJvaJLz+P78/uGoh4OkAy+B9avXyqD+/osEev7aweXitAEohwF7aHh9YcJlZ/7qdNhwkI9r4QAAAAACAAD/5gRGBicAFwAjAAA3FjMyEhEGJwYjIgA1NAAzMgAREAAhIicTFBYzMjY1NCYjIga5gJTQ5QICZvHD/wABGN/nAQT+sP7Xo3FxoI16pqWHfaTSSgFIAS4CAsIBDNbkASX+lf6l/lP+MjUEHJy3p3en274AAAAAABAAxgABAAAAAAABAA8AAAABAAAAAAACAAcADwABAAAAAAADAA8AFgABAAAAAAAEAA8AJQABAAAAAAAFAAsANAABAAAAAAAGAA8APwABAAAAAAAKACsATgABAAAAAAALABMAeQADAAEECQABAB4AjAADAAEECQACAA4AqgADAAEECQADAB4AuAADAAEECQAEAB4A1gADAAEECQAFABYA9AADAAEECQAGAB4BCgADAAEECQAKAFYBKAADAAEECQALACYBfmZhbmdjaGFuLXNlY3JldFJlZ3VsYXJmYW5nY2hhbi1zZWNyZXRmYW5nY2hhbi1zZWNyZXRWZXJzaW9uIDEuMGZhbmdjaGFuLXNlY3JldEdlbmVyYXRlZCBieSBzdmcydHRmIGZyb20gRm9udGVsbG8gcHJvamVjdC5odHRwOi8vZm9udGVsbG8uY29tAGYAYQBuAGcAYwBoAGEAbgAtAHMAZQBjAHIAZQB0AFIAZQBnAHUAbABhAHIAZgBhAG4AZwBjAGgAYQBuAC0AcwBlAGMAcgBlAHQAZgBhAG4AZwBjAGgAYQBuAC0AcwBlAGMAcgBlAHQAVgBlAHIAcwBpAG8AbgAgADEALgAwAGYAYQBuAGcAYwBoAGEAbgAtAHMAZQBjAHIAZQB0AEcAZQBuAGUAcgBhAHQAZQBkACAAYgB5ACAAcwB2AGcAMgB0AHQAZgAgAGYAcgBvAG0AIABGAG8AbgB0AGUAbABsAG8AIABwAHIAbwBqAGUAYwB0AC4AaAB0AHQAcAA6AC8ALwBmAG8AbgB0AGUAbABsAG8ALgBjAG8AbQAAAAIAAAAAAAAAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACwECAQMBBAEFAQYBBwEIAQkBCgELAQwAAAAAAAAAAAAAAAAAAAAA'
# base64解碼  base64.decodebytes()可處理str類型
binData = base64.decodebytes(base64Str.encode())
# 寫入otf字體文件
filePath01 = r'F://data_temp/jiemi_20190402_03.otf'
filePath02 = r'F://data_temp/text_20190402_03.xml'
with open(filePath01, 'wb') as f:
        f.write(binData)
        f.close()
# 解析字體庫
font01 = TTFont(filePath01)
# BytesIO() 把二進制數據bin_data當作文件來操作,TTFont接收一個文件類型
# font01 = TTFont(BytesIO(binData))
uniList = font01['cmap'].tables[0].ttFont.getGlyphOrder()
utfList = font01['cmap'].tables[0].ttFont.tables['cmap'].tables[0].cmap  # c = font.getBestCmap()
retList = []
getText = '麣龒鵂驋龒鑶鑶麣龥龤龤'
for i in getText:
    # ord()以字符作爲參數,返回對應的Unicode數值
    if ord(i) in utfList:
        text = int(utfList[ord(i)][-2:]) - 1
    else:
        text = i
    retList.append(text)
crackText = ''.join([str(i) for i in retList])
print(crackText)
# 13823661900



可以看到最後的結果打印出來是和網頁上我們看到的號碼是一樣的,解密成功~ well done !

 

 

 

 

 

 

 

 

 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章