驗證碼識別之連體字符切割

根據前面的幾篇博客已經知道，如果驗證碼裏的字符之間沒有相連，我們使用任意一個機器學習的算法(KNN,SVM等)很容易就可以把他們切割標註識別出來，實際上很多網站的驗證碼都不可能那麼簡單，那麼我們字符連接如何切割是一個難題。如果這個時候你去問一些人，你會發現答案大部分都是叫你使用CNN也就是卷積神經網絡來識別，這樣就可以避免切割字符。難道就不能使用機器學習的算法識別嗎？

我們先看一個比較簡單，但是無法使用投影法切割的驗證碼：

這個驗證碼X和N之間是連在一起的，無法簡單的切割，而且字符都有一定程度的傾斜，向下投影的話，可能並沒有明顯的切割邊界, 當然針對N和h這種情況可以使用其他方法切割，比如通過連通域來切割字符。

在解決這個問題之前，我們先思考另一個問題，爲什麼識別驗證碼要切割字符，一定要切割字符嗎？當然不一定，實際上即使使用KNN和SVM等算法也可以不切割字符來達到識別的效果，但是如果把驗證碼當成一個整體的話，類別就不是單個字符了，而是多個字符組成的整體，那麼你標註的任務量會非常巨大，從原來的26個字母+10個數字的類別數直接變成了從36個字符中選出4個字符的類別數，這可不是一點點變化，使用手工標註的話，估計你的孫子都叫你爺爺了。這還僅僅是4個字符的驗證碼。

如何切割連體字符呢？滴水算法。原理很簡單，我們先指定一個水滴的位置，比如在X和N的上方某個像素點，然後讓他按照某種規則下落，當水滴到達圖片底部時，它走過的路徑就是切割的邊界(曲線切割)，爲了更容易理解，我們看一張圖：

水滴下落走的方向有5種，分別是左、右、左下、下、和右下，至於走哪個方向就看這5個位置的像素點是黑色還是白色(注意滴水算法只能用於二值化的圖片)，傳統的滴水算法有6個規則來指定水滴的走向(這裏我用背景代替白色像素點，筆跡代替黑色像素點)：

全爲背景或者全爲筆跡 -> 水滴向下走
左下爲背景，且其他點至少有一個爲筆跡　-> 走左下
左下角爲筆跡，正下方爲背景色 -> 走下
左下角跟正下方爲筆跡的顏色，右下方爲背景色 -> 走右下
下方全爲筆跡顏色，且右邊爲背景色 -> 走右
除了左邊是背景色，其他均爲筆跡顏色 -> 走左

我們並不需要去記住這些規則，寫程序的時候才需要將邏輯分開。這六條規則總結起來很簡單，哪裏有路走哪裏，如果有多條路則看路的優先級(下>左下>右下>右>左)，如果都沒有路則直接把下踩出路繼續走。

我們來用Python實現一下，代碼如下：

def dropfall(img, start):
	'''
		水滴起始下路位置爲(0, start)
	'''
    a = np.array(img)
    a = (a < 200) * 1
    height, _ = a.shape
    x, y = 0, start
    way = [] # 存儲水滴走過的路徑
    while x+1 < height:
        n1, _, n5 = a[x, y-1:y+2] # 左(n1)和右(n5)
        n2, n3, n4 = a[x+1, y-1:y+2]  # 左下(n2)、下(n3)、右下(n4)
        # if和elif的條件就是上面6條規則，順序也是一樣的
        if n1 == n2 == n3 == n4 == n5:
            x += 1
        elif n2 == 0 and any((n1, n3, n4, n5)):
            x += 1
            y -= 1
        elif n2 == 1 and n3 == 0:
            x += 1
        elif all((n2, n3)) and n4 == 0:
            x += 1
            y += 1
        elif all((n2, n3, n4)) and n5 == 0:
            y += 1
            # 避免這一步和下一步進入死循環
            if (x, y) in way:
                x += 1
        elif all((n2, n3, n4, n5)) and n1 == 0:
            y -= 1
        way.append((x, y))
    return way

既然算法已經有了，那讓我們來切割驗證碼，爲了讓切割看起來更直觀，我們使用matplotlib來顯示驗證碼和切割路徑，代碼如下：

import numpy as np
import os
from PIL import Image
import matplotlib.pyplot as mp


def dropfall(img, start):
    a = np.array(img)
    a = (a < 200) * 1
    height, _ = a.shape
    x, y = 0, start
    way = []
    while x+1 < height:
        n1, _, n5 = a[x, y-1:y+2]
        n2, n3, n4 = a[x+1, y-1:y+2]
        if n1 == n2 == n3 == n4 == n5:
            x += 1
        elif n2 == 0 and any((n1, n3, n4, n5)):
            x += 1
            y -= 1
        elif n2 == 1 and n3 == 0:
            x += 1
        elif all((n2, n3)) and n4 == 0:
            x += 1
            y += 1
        elif all((n2, n3, n4)) and n5 == 0:
            y += 1
            if (x, y) in way:
                x += 1
        elif all((n2, n3, n4, n5)) and n1 == 0:
            y -= 1
        way.append((x, y))
    return way
        
os.chdir('G:\\knn\\')
img = Image.open('3.png').convert('L')
a = np.array(img)
a = (a > 200) * 255
width, height = a.shape

x = []
for i in range(width):
    for j in range(height):
        if a[i, j] == 0:
            x.append([i, j])
#print(x)
x = np.array(x)

mp.scatter(x[:,1], x[:, 0], s=10)
ax = mp.gca()                               
ax.xaxis.set_ticks_position('top') 
ax.invert_yaxis() 

way = dropfall(img, 54)
way_x = [i[0] for i in way]
way_y = [i[1] for i in way]
mp.scatter(way_y, way_x, marker='*')

way = dropfall(img, 71)
way_x = [i[0] for i in way]
way_y = [i[1] for i in way]
mp.scatter(way_y, way_x, marker='*')

way = dropfall(img, 89)
way_x = [i[0] for i in way]
way_y = [i[1] for i in way]
mp.scatter(way_y, way_x, marker='*')
mp.show()

切割效果：

可以看出，切割效果並不是很理想，它將N這個字符的一部分分給了X，Y也被切掉了一部分。不過這並不是算法的問題，而是N這個字符左上角有一部分缺口，Y被切掉一部分是因爲我們指定的切割起始點有問題。如果就按圖上的切割，其實每個字符的特徵還在，直接用於驗證碼識別的話，效果不會太差。

切割代碼中的三個切割起始點都是我根據驗證碼給定的，那麼如何讓程序自動獲取到切割邊界，我們可以從上面的效果看到，切割起始點的好壞直接決定了切割字符的好壞，在傳統滴水算法中是這樣尋找切割起始點的：從左至右找到圖片左側爲黑色像素、右側有黑的像素的白色像素點。但這並不準確，對於X和Y兩個字符來說，這樣找到的邊界在X和Y的中間，算法會直接把XY劈成兩半。

其實分割字符我最開始想到的並不是滴水算法，而是聚類算法。不過聚類算法達到的效果很差，我們看一下例子：

from sklearn.cluster import AgglomerativeClustering
from sklearn.cluster import KMeans
import numpy as np
import os
from PIL import Image
import matplotlib.pyplot as mp


os.chdir('G:\\knn\\')
img = Image.open('3.png').convert('L')
a = np.array(img)
a = (a > 200) * 255
width, height = a.shape

x = []
for i in range(width):
    for j in range(height):
        if a[i, j] == 0:
            x.append([i, j])
x = np.array(x)
model = KMeans(n_clusters=4)
# # model = AgglomerativeClustering(n_clusters=4)
model.fit(x)

mp.scatter(x[:,1], x[:, 0], c=model.labels_, s=10, cmap='brg')
ax = mp.gca()                               
ax.xaxis.set_ticks_position('top') 
ax.invert_yaxis() 

mp.show()

代碼運行效果如下：

這效果差嗎？不差，但這僅僅是在這張圖片上。因爲這張圖片每個字符都保持了一定的距離，所以聚類算法能表現不錯。我試了多個驗證碼其中只有少數才能達到如圖一樣的效果。另外，在所有聚類算法中，AgglomerativeClustering和KMeans表現的最好，而這兩個算法在不同的驗證碼中又表現的不一樣，有時這個好，有時另一個又很好，當然也有兩個都表現很差的驗證碼。

那麼我們如果使用聚類算法來找水滴算法的起始點，效果會怎麼樣呢？依舊不理想，但相對於直接聚類來說要好。我們看一下代碼和效果圖：

from sklearn.cluster import KMeans
import numpy as np
import os
from PIL import Image
import matplotlib.pyplot as mp


def dropfall(img, start):
    a = np.array(img)
    a = (a < 200) * 1
    height, _ = a.shape
    x, y = 0, start
    way = []
    while x+1 < height:
        n1, _, n5 = a[x, y-1:y+2]
        n2, n3, n4 = a[x+1, y-1:y+2]
        if n1 == n2 == n3 == n4 == n5:
            x += 1
        elif n2 == 0 and any((n1, n3, n4, n5)):
            x += 1
            y -= 1
        elif n2 == 1 and n3 == 0:
            x += 1
        elif all((n2, n3)) and n4 == 0:
            x += 1
            y += 1
        elif all((n2, n3, n4)) and n5 == 0:
            y += 1
        elif all((n2, n3, n4, n5)) and n1 == 0:
            y -= 1
            if (x, y) in way:
                x += 1
        way.append((x, y))
    return way
        
os.chdir('G:\\knn\\')
img = Image.open('3.png').convert('L')
a = np.array(img)
a = (a > 200) * 255
width, height = a.shape

x = []
for i in range(width):
    for j in range(height):
        if a[i, j] == 0:
            x.append([i, j])
x = np.array(x)
model = KMeans(n_clusters=4)
model.fit(x)
# 計算切割水滴起始點
x1 = x[:,1][model.labels_==0].min()
x2 = x[:,1][model.labels_==1].min()
x3 = x[:,1][model.labels_==2].min()
x4 = x[:,1][model.labels_==3].min()
x_min = sorted([x1, x2, x3, x4])[1:]
x1 = x[:,1][model.labels_==0].max()
x2 = x[:,1][model.labels_==1].max()
x3 = x[:,1][model.labels_==2].max()
x4 = x[:,1][model.labels_==3].max()
x_max = sorted([x1, x2, x3, x4])[:-1]
x1, x2, x3 = [(i+j)//2 for i, j in zip(x_min, x_max)]
# 畫驗證碼
mp.scatter(x[:,1], x[:, 0], c=model.labels_, s=10, cmap='brg')
ax = mp.gca()                               
ax.xaxis.set_ticks_position('top') 
ax.invert_yaxis() 
# 畫切割路徑
way = dropfall(img, x1)
way_x = [i[0] for i in way]
way_y = [i[1] for i in way]
mp.scatter(way_y, way_x, marker='*')

way = dropfall(img, x2)
way_x = [i[0] for i in way]
way_y = [i[1] for i in way]
mp.scatter(way_y, way_x, marker='*')

way = dropfall(img, x3)
way_x = [i[0] for i in way]
way_y = [i[1] for i in way]
mp.scatter(way_y, way_x, marker='*')

mp.show()

在代碼中，爲了減少誤差，起始邊界我是計算字符的右邊界和它臨近字符的左邊界的平均值。

即使這樣，所達到的效果還是不理想。這是因爲字符的中空，對於實體字符而言，水滴切割效果會比這個好，不過對於實體字符的話，用聚類找到的邊界會相對較差。

目前我所達到的也就這個水平了，如果後續還有什麼改進或者新思路的話，在分享吧。或者如果你有什麼大膽的想法也可以說出來，說不定就能達到不錯的效果呢。

驗證碼識別之連體字符切割

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

全面系統的AI學習路徑，幫助普通人也能玩轉AI

HTML 00 Tutorial

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

vue3編譯優化之“靜態提升”

又是一個月-20240513

flask 如何保證返回json有序

驗證碼預處理

圖片數據集持久化保存(序列化)

驗證碼識別之連體字符切割

Windows10安裝TensorFlow-gpu

selenium如何連接已經打開的瀏覽器

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結