C++ opencv table Cell extract

對於表單元的提取,通過對前面博客中對求得的MASK圖像使用與操作,得到joints圖像,通過對joints圖像進行處理即可得到對應表交點的座標,通過對座標處理實現表單元的分割處理。我使用的是Vector來實現座標的存儲,通過erase方法實現相同值的提取,通過閾值處理實現表點座標的提取。進行實現表單元的分割提取。

  • For the extraction of form elements, I used and operated MASK images obtained in the previous blog to obtain joints images, and through the processing of JOINTS images, I obtained joints images for corresponding table intersection points, and realized the segmentation of form elements through the processing of coordinates.I used Vector to realize the storage of coordinates, erase method to achieve the extraction of the same value, and threshold processing to achieve the extraction of table point coordinates.Carry out the segmentation and extraction of form cells.

核心代碼如下:

  • the core of code as the following:
vector<int> Variable_Pixel_White_Y_OK;
vector<int> Variable_Pixel_White_X_OK;
vector<int> Variable_Pixel_White_Y;
vector<int> Variable_Pixel_White_X;
int pixel_white = 0;
int white_pixel_Y_last = 0;
for (int y = 0; y < joints.rows; y++)//rows
{
	for (int x = 0; x < joints.cols; x++)//cols
	{
		pixel_white = joints.at<uchar>(y, x);
		if (pixel_white == 255)//白色像素的位置
		{
			Variable_Pixel_White_Y.push_back(y);//row
			Variable_Pixel_White_X.push_back(x);//col
		}
	}
}
if (Variable_Pixel_White_X.size() > 2 && Variable_Pixel_White_Y.size() > 2)
{
	//========================================================================================================
	sort(Variable_Pixel_White_X.begin(), Variable_Pixel_White_X.end());
	Variable_Pixel_White_X.erase(unique(Variable_Pixel_White_X.begin(), Variable_Pixel_White_X.end()), Variable_Pixel_White_X.end());
	for (unsigned int i = 0; i < Variable_Pixel_White_X.size() - 2; i++)
	{
		if ((Variable_Pixel_White_X[i + 2] - Variable_Pixel_White_X[i + 1]) - (Variable_Pixel_White_X[i + 1] - Variable_Pixel_White_X[i]) > 10)//
		{

			Variable_Pixel_White_X_OK.push_back(Variable_Pixel_White_X[i + 1]);
		}
	}
	Variable_Pixel_White_X_OK.push_back(Variable_Pixel_White_X[Variable_Pixel_White_X.size() - 1]);
	//========================================================================================================
	//========================================================================================================
	sort(Variable_Pixel_White_Y.begin(), Variable_Pixel_White_Y.end());
	Variable_Pixel_White_Y.erase(unique(Variable_Pixel_White_Y.begin(), Variable_Pixel_White_Y.end()), Variable_Pixel_White_Y.end());
	for (unsigned int i = 0; i < Variable_Pixel_White_Y.size() - 2; i++)
	{
		if ((Variable_Pixel_White_Y[i + 2] - Variable_Pixel_White_Y[i + 1]) - (Variable_Pixel_White_Y[i + 1] - Variable_Pixel_White_Y[i]) > 10)//
		{

			Variable_Pixel_White_Y_OK.push_back(Variable_Pixel_White_Y[i + 1]);
		}
	}
	Variable_Pixel_White_Y_OK.push_back(Variable_Pixel_White_Y[Variable_Pixel_White_Y.size() - 1]);
	//========================================================================================================
	//========================================================================================================
	cout << "cols:" << Variable_Pixel_White_X_OK.size() - 1 << endl;
	cout << "rows:" << Variable_Pixel_White_Y_OK.size() - 1 << endl;
	//========================================================================================================
	//========================================================================================================
	//------------------------------------------------>分割<--------------------------------------------------
	int rect_x = 0, rect_y = 0;
	int d_y = 0, d_num = 0, Abs = 0;
	int d = 0, h = 0;
	for (int i = 0; i < Variable_Pixel_White_Y_OK.size() - 1; i++)
	{
		for (int j = 0; j < Variable_Pixel_White_X_OK.size() - 1; j++)
		{
			//
			d = Variable_Pixel_White_X_OK[j + 1] - Variable_Pixel_White_X_OK[j];
			h = Variable_Pixel_White_Y_OK[i + 1] - Variable_Pixel_White_Y_OK[i];
			rect_x = Variable_Pixel_White_X_OK[j];
			rect_y = Variable_Pixel_White_Y_OK[i];
			//(0 <= roi.x && 0 <= roi.width &&
			//roi.x + roi.width <= m.cols &&
			//0 <= roi.y && 0 <= roi.height &&
			//roi.y + roi.height <= m.rows)
			if (rect_x + 5 >= 0 &&
				rect_y + 5 >= 0 &&
				d - 10 >= 0 &&
				h - 10 >= 0 &&
				rect_x + 5 + d - 10 <= gray.cols &&
				rect_y + 5 + h - 10 <= gray.rows)
			{
				Rect rect(rect_x + 5, rect_y + 5, d - 10, h - 10);
				Mat ROI = gray(rect);
				string Img_Name = "./title_time/roi/" + to_string(save_Img) + ".jpg";
				save_Img++;
				imwrite(Img_Name, ROI);
			}
		}
	}
}
else
{

}

但是這種方法適用於常規的表格分割,對於含有合併單元格的不能這麼處理,這部分有待繼續研究。正在研究新的方法進行對合並單元的表提取分割。後續的工作基於本次代碼基礎之上實現,對於本次代碼如何使用,可以參考我前幾天的博客中表提取,將這部分代碼加入其中就可以。在這裏我就不全部列出了。

  • However, this method is suitable for regular table, and it cannot be done for those containing merged cells. This part needs further study.A new method for table extraction and segmentation of merged cells is being studied.The subsequent work is based on this code. For how to use this code, please refer to the table extraction in my blog a few days ago and add this part of code into it.I won't list them all here.

 I hope I can help you,If you have any questions, please  comment on this blog or send me a private message. I will reply in my free time.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章