OCR-文本檢測和文本識別的數據

1.文本識別數據集

1.1.Synthetic Chinese String Dataset

該數據集是中文識別數據集,包含360多萬張訓練圖片,5824個字符,不過場景比較簡單,圖片是白底黑字。
下載地址:https://pan.baidu.com/s/1dFda6R3

圖片,文字標籤
在這裏插入圖片描述在這裏插入圖片描述在這裏插入圖片描述

2.文本檢測數據

ICPR MWI 2018 挑戰賽

大賽提供20000張圖像作爲數據集,其中50%作爲訓練集,50%作爲測試集。主要由合成圖像,產品描述,網絡廣告構成。該數據集數據量充分,中英文混合,涵蓋數十種字體,字體大小不一,多種版式,背景複雜。文件大小爲2GB。

https://tianchi.aliyun.com/competition/information.htm?raceId=231651&_is_login_redirect=true&accounttraceid=595a06c3-7530-4b8a-ad3d-40165e22dbfe

鏈接:https://pan.baidu.com/s/1zxXokAYsyVbfWP2dUPGrPw
提取碼:z1bj

2.1.Pascal VOC2007

$ cd $FRCN/data
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

$ tar xvf VOCdevkit_08-Jun-2007.tar
$ tar xvf VOCtrainval_06-Nov-2007.tar
$ tar xvf VOCtest_06-Nov-2007.tar

$ ln -s VOCdevkit VOCdevkit2007 #create a softlink

鏈接:https://pan.baidu.com/s/1n3HSbDVZ-75SXC1PNC7bHA
提取碼:8k9a
複製這段內容後打開百度網盤手機App,操作更方便哦

在這裏插入圖片描述

在這裏插入圖片描述

2.2.MSRA Text Detection 500 Database (MSRA-TD500)

MSRA文本檢測500數據庫(MSRA-TD500)包含500個自然圖像,使用數據包相機從室內(辦公室和商場)和室外(街道)場景拍攝,室內圖像主要是標誌,門板和警示牌,而室外圖像主要是複雜背景下的導板和廣告牌。圖像的分辨率從1296x864到1920x1280不等。由於文本的多樣性和圖像背景的複雜性,數據集非常具有挑戰性。文本可以是不同的語言(中文,英文或兩者的混合),字體,大小,顏色和方向。

http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_%28MSRA-TD500%29
http://www.iapr-tc11.org/dataset/MSRA-TD500/MSRA-TD500.zip

在這裏插入圖片描述
在這裏插入圖片描述

0 0 749 860 47 105 -0.048040
1 1 728 919 16 44 -0.023252

2.3.COCO-TEXT

英文數據集,包括63686幅圖像,173589個文本實例,包括手寫版和打印版,清晰版和非清晰版。文件大小12.58GB,訓練集:43686張,測試集:10000張,驗證集:10000張。
下載地址:https://vision.cornell.edu/se3/coco-text-2/

在這裏插入圖片描述

COCO-Text API
The COCO-Text API assists in loading and parsing the annotations in COCO-Text. For details, see coco.py and also the coco_text_Demo ipython notebook.
getAnnIds Get ann ids that satisfy given filter conditions
getImgIds Get img ids that satisfy given filter conditions
loadAnns Load anns with the specified ids.
loadImgs Load imgs with the specified ids.
loadRes Load algorithm results and create API for accessing them.
The annotations are stored using the JSON file format. The annotations format has the following data structure:
{
“info” : info,
“imgs” : [image],
“anns” : [annotation]
}
info{
“version” : str,
“description” : str,
“author” : str,
“url” : str,
“date_created” : datetime
}
image{
“id” : int,
“file_name” : str,
“width” : int,
“height” : int,
“set” : str # ‘train’ or ‘val’
}
Each text instance annotation contains a series of fields, including an enclosing bounding box, category annotations, and transcription.
annotation{
“id” : int,
“image_id” : int,
“class” : str # ‘machine printed’ or ‘handwritten’ or ‘others’
“legibility” : str # ‘legible’ or ‘illegible’
“language” : str # ‘english’ or ‘not english’ or ‘na’
“area” : float,
“bbox” : [x,y,width,height],
“utf8_string” : str,
“polygon” : []
}

2.4.Google FSNS(谷歌街景文本數據集)

該數據集是從谷歌法國街景圖片上獲得的一百多萬張街道名字標誌,每一張包含同一街道標誌牌的不同視角,圖像大小爲600*150,訓練集1044868張,驗證集16150張,測試集20404張。
下載地址:http://rrc.cvc.uab.es/?ch=6&com=downloads

2.5.Reading Chinese Text in the Wild(RCTW-17)

該數據集包含12263張圖像,訓練集8034張,測試集4229張,共11.4GB。大部分圖像由手機相機拍攝,含有少量的屏幕截圖,圖像中包含中文文本與少量英文文本。圖像分辨率大小不等。icdar2017rctw_train_v1.2
下載地址:http://rctw.vlrlab.net/dataset/
icdar2017rctw_train_v1.2
圖片,座標位置和文本
一串字一個方框
,,,,,,,,,""; ,,,,,,,,,""; …
例子:
在這裏插入圖片描述
390,902,1856,902,1856,1225,390,1225,0,“金氏眼鏡”
1875,1170,2149,1170,2149,1245,1875,1245,0,“創於1989”
2054,1277,2190,1277,2190,1323,2054,1323,0,“城建店”
768,1648,987,1648,987,1714,768,1714,0,“金氏眼”
897,2152,988,2152,988,2182,897,2182,0,“金氏眼鏡”
1457,2228,1575,2228,1575,2259,1457,2259,0,“金氏眼鏡”
1858,2218,1966,2218,1966,2250,1858,2250,0,“金氏眼鏡”
231,1853,308,1843,309,1885,230,1899,1,“謝#惠顧”
125,2270,180,2270,180,2288,125,2288,1,"###"
106,2297,160,2297,160,2316,106,2316,1,"###"
22,2363,82,2363,82,2383,22,2383,1,"###"
524,2511,837,2511,837,2554,524,2554,1,"###"
455,2456,921,2437,920,2478,455,2501,0,“歡迎光臨”

在這裏插入圖片描述

396,2287,1079,2287,1079,2717,396,2717,0,“富士”
1159,2394,1361,2394,1361,2788,1159,2788,0,“壽司”
1434,2496,1682,2496,1682,2815,1434,2815,0,“屋”

2.6.Chinese Text in the Wild(CTW)

該數據集包含32285張圖像,1018402箇中文字符(來自於騰訊街景), 包含平面文本,凸起文本,城市文本,農村文本,低亮度文本,遠處文本,部分遮擋文本。圖像大小2048*2048,數據集大小爲31GB。以(8:1:1)的比例將數據集分爲訓練集(25887張圖像,812872個漢字),測試集(3269張圖像,103519個漢字),驗證集(3129張圖像,103519個漢字)。
下載地址:https://ctwdataset.github.io/
https://share.weiyun.com/50hF1Cc
https://ctwdataset.github.io/tutorial/1-basics.html

圖片,文本框的座標
{annotations, [{adjusted_bbox, “attributes”: [“distorted”, “raised”], is_chinese,polygon,}], file_name, height, ignore, width, image_id}
一個字一個方框
Training set annotation format
All .jsonl annotation files (e.g. …/data/annotations/train.jsonl) are UTF-8 encoded JSON Lines, each line is corresponding to the annotation of one image.

The data struct for each of the annotations in training set (and validation set) is described below.

annotation (corresponding to one line in .jsonl):
{
image_id: str,
file_name: str,
width: int,
height: int,
annotations: [sentence_0, sentence_1, sentence_2, …], # MUST NOT be empty
ignore: [ignore_0, ignore_1, ignore_2, …], # MAY be an empty list
}

sentence:
[instance_0, instance_1, instance_2, …] # MUST NOT be empty

instance:
{
polygon: [[x0, y0], [x1, y1], [x2, y2], [x3, y3]], # x, y are floating-point numbers
text: str, # the length of the text MUST be exactly 1
is_chinese: bool,
attributes: [attr_0, attr_1, attr_2, …], # MAY be an empty list
adjusted_bbox: [xmin, ymin, w, h], # x, y, w, h are floating-point numbers
}

attr:
“occluded” | “bgcomplex” | “distorted” | “raised” | “wordart” | “handwritten”

ignore:
{
polygon: [[x0, y0], [x1, y1], [x2, y2], [x3, y3]],
bbox: [xmin, ymin, w, h],
]

在這裏插入圖片描述

2.7.中文數據集的自動合成

github地址:https://github.com/JarveeLee/SynthText_Chinese_version

2.8.OCR數據集list

github地址:https://github.com/xylcbd/ocr-open-dataset

2.9.SynthText in the Wild dataset

This dataset consists of 8 million images covering 90k English words, and includes the training, validation and test splits used in our work.
該數據集包含8百萬張圖片,涵蓋9萬個英文單詞。出自牛津大學。

下載地址:http://www.robots.ox.ac.uk/~vgg/data/scenetext/

數據例子:
在這裏插入圖片描述

在這裏插入圖片描述

3.扭曲文本

3.1.Total-Text

該數據集共1555張圖像,11459文本行,包含水平文本,傾斜文本,彎曲文本。文件大小441MB。大部分爲英文文本,少量中文文本。訓練集:1255張 測試集:300
下載地址:http://www.cs-chan.com/source/ICDAR2017/totaltext.zip
https://github.com/cs-chan/Total-Text-Dataset

在這裏插入圖片描述

在這裏插入圖片描述

4.icdar

https://rrc.cvc.uab.es/?ch=8&com=downloads

4.1.DocVQA-2020
4.1.1.overview
4.1.2.tasks
4.1.3.downloads
4.1.4.results
4.1.5.my methods
4.1.6.organizers

4.2.ST-VOA-2019
4.2.1.overview
4.2.2.tasks
4.2.3.downloads
4.2.4.results
4.2.5.my methods
4.2.6.organizers

4.3.MLT-2019
overview
tasks
downloads
results
my methods
organizers

4.4.LSVT-2019
4.5.ArT-2019
4.6.SROIE-2019
4.7.ReCTS-2019
4.8.COCO-Text-2017
4.9.DeTEXT-2017
4.10.DOST-2017
4.11.FSNS-2017
4.12.MLT-2017
4.13.IEHHR-2017
4.14.Incidental Scene Text-2015
4.15.Text in Videos-2013-2015
4.16.Focused Scene Text-2013-2015
4.17.Born-Digital Images(Web and Email)-2011-2015

4.17.1.overview
Overview - Born-Digital Images (Web and Email)
Images are frequently used in electronic documents (Web and email) to embed textual information. The use of images as text carriers stems from a number of needs. For example images are used in order to beautify (e.g. titles, headings etc), to attract attention (e.g. advertisements), to hide information (e.g. images in spam emails used to avoid text-based filtering), even to tell a human apart from a computer (CAPTCHA tests).
Automatically extracting text from born-digital images is therefore an interesting prospect as it would provide the enabling technology for a number of applications such as improved indexing and retrieval of Web content, enhanced content accessibility, content filtering (e.g. advertisements or spam emails) etc.
While born-digital text images are on the surface very similar to real scene text images (both feature text in complex colour settings) at the same time they are distinctly different. Born-digital images are inherently low-resolution (made to be transmitted online and displayed on a screen) and text is digitally created on the image; scene text images on the other hand are high-resolution camera captured ones. While born-digital images might suffer from compression artefacts and severe anti-aliasing they do not share the illumination and geometrical problems of real-scene images. Therefore it is not necessarily true that methods developed for one domain would work in the other.
In 2011 we set out to find out the state of the art in Text Extraction in both domains (born-digital images and real scene). We received 24 submissions over three different tasks in the born-digital Challenge, 10 during the competition run and 14 more over the following year, after the competition was opened in a continuous mode in October 2011.
Given the strong interest displayed by the community, and the fact that there is still a large margin for improvement, in the ICDAR 2013 edition we revisited the tasks of localisation, segmentation and recognition and invited further submissions on an updated and even more challenging dataset. We received 13 submissions during the 2013 edition and the year following it, when the competition was opened in a continuous mode.
For the 2015 edition, we are introducing a new task: End-to-End, referring to text localisation and recognition in a single go at the word level. The rest of the tasks remain open in a continuous mode, unchanged form the 2013 edition. See details in the Tasks page.
The results from the past ICDAR competitions can be found in the ICDAR proceedings [1, 2].

D.Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez, J. Almazan, L.P. de las Heras , “ICDAR 2013 Robust Reading Competition”, In Proc. 12th International Conference of Document Analysis and Recognition, 2013, IEEE CPS, pp. 1115-1124. [pdf] [poster] [presentation][pdf] [poster] [presentation]

D. Karatzas, S. Robles Mestre, J. Mas, F. Nourbakhsh, P. Pratim Roy , “ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email)”, In Proc. 11th International Conference of Document Analysis and Recognition, 2011, IEEE CPS, pp. 1485-1490. [pdf] [presentation]
[pdf] [presentation]

4.17.2.tasks

Tasks - Born-Digital Images (Web and Email)

The Challenge is set up around four tasks:
Text Localization, where the objective is to obtain a rough estimation of the text areas in the image, in terms of bounding boxes that correspond to parts of text (words or text lines).
Text Segmentation, where the objective is the pixel level separation of text from the background.
Word Recognition, where the locations (bounding boxes) of words in the image are assumed to be known and the corresponding text transcriptions are sought.
End-to-End, where the objective is to localise and recognise all words in the image in a single step.
For the 2015 edition, the focus is solely on task T1.4 “End-to-End”. The rest of the tasks are open for submissions but will not be included / analysed in the ICDAR 2015 report.
A training set of 410 images (containing 3564 words) is provided through the downloads section. The training set is common for all three tasks, although different ground truth data is provided for each of them.
All images are provided as PNG files and the text files are ASCII files with CR/LF new line endings.
4.17.2.1.Task 1.1: Text Localization
For the text localization task we provide bounding boxes of words for each of the images. The ground truth is given as separate text files (one per image) where each line specifies the coordinates of one word’s bounding box and its transcription in a comma separated format (see Figure 1).

For the text localization task the ground truth data is provided in terms of word bounding boxes. For each image in the training set a separate ASCII text file will be provided, following the naming convention:
gt_[image name].txt
The text files are comma separated files, where each line will corresponds to one word in the image and gives its bounding box coordinates and its transcription in the format:
left, top, right, bottom, “transcription”
Please note that the escape character () is used for double quotes and backslashes (see for example img_4 in Figure 1).
The authors will be required to automatically localise the text in the images and return bounding boxes. The results will have to be submitted in separate text files for each image, with each line corresponding to a bounding box (comma separated values) as per the above format. A single compressed (zip or rar) file should be submitted containing all the result files. In the case that your method fails to produce any results for an image, you can either include an empty result file or no file at all.
The evaluation of the results will be based on the algorithm of Wolf et al [1] which in turn is an improvement on the algorithms used in the robust reading competitions in previous ICDAR instalments.
4.17.2.2.Task 1.2: Text Segmentation

For the text segmentation task, the ground truth data is provided in the form of colour-coded PNG images following the naming convention:
gt_[image name].png
In the ground truth images, white pixels should be interpreted as background pixels, while non-white pixels as text (see Figure 2). The non-white pixels are colour coded, so that each atom in the image is shown in the same colour. An atom is defined in accordance to [2] as the minimum set of connected components that can be assigned a semantic interpretation. So atoms might comprise single components that correspond to one or multiple (e.g. in the case of cursive text) characters, or they might comprise multiple components that correspond to one (e.g. letters “i”, “j”, the letters of the IBM logo) or multiple characters.
The authors will be asked to automatically segment the test images and submit their segmentation result as a series of bi-level images, following the same format. A single compressed (zip or rar) file should be submitted containing all the result files. In the case that your method fails to produce any results for an image, you can either include an empty result file or no file at all.
Evaluation will be primarily based on the methodology proposed by the organisers in the paper [2], while a typical precision / recall measurement will also be provided for consistency, in the same fashion as [3].
4.17.2.3.Task 1.3: Word Recognition

For the word recognition task, we provide all the words in our dataset with 3 characters or more in separate image files, along with the corresponding ground-truth transcription (See Figure 2 for examples). The transcription of all words is provided in a SINGLE text file for the whole collection. Each line in the ground truth file has the following format:
[image name], “transcription”
An example is given in figure 3. Please note that the escape character () is used for double quotes and backslashes (see for example the transcriptions of 15.png and 20.png in Figure 3).
For testing we will provide the images of about 400 words and we will ask for the transcription of each image. A single transcription per image will be requested. The authors should return all result transcriptions in a single text file of the same format as the ground truth.
For the evaluation we will calculate the edit distance between the submitted image and the ground truth transcription. Equal weights will be set for all edit operations. The best performing method will be the one with the smallest total edit distance.
Note that words are cut-out with a frame of 4 pixels around them (instead of the tight bounding box), in order to preserve the immediate context. This is usual practice to facilitate processing (see for example the MNIST character dataset).
4.17.2.4.Task 1.4: End to End
Ground truth is provided for each image of the training set that comprises the bounding quadrilateral of each word as well as the transcription of the word. The ground truth is the same as for Task 1.1. One- or two-character words as well as words deemed unreadable are annotated in the dataset as “do not care” following the ground truthing protocol (to be made public).
Vocabularies
Apart from the transcription and location ground truth we provide a generic vocabulary of about 90k words, a vocabulary of all words in the training set and per-image vocabularies of 100 words comprising all words in the corresponding image as well as distractor words selected from the rest of the training set vocabulary, following the setup of Wang et al [4]. Authors are free to incorporate other vocabularies / text corpuses during training to enhance their language models, in which case they will be requested to indicate so at submission time to facilitate the analysis of results.
All vocabularies provided contain words of 3 characters or longer comprising only letters.
Vocabularies do not contain alphanumeric structures that correspond to prices, URLs, times, dates, emails etc. Such structures, when deemed readable, are tagged in the images and an end-to-end method should be able to recognise them, although the vocabularies provided do not inlcude them explicitly.
Words were stripped by any preceding or trailing symbols and punctuation marks before they were added in the vocabulary. Words that still contained any symbols and puctuation marks (with the exception of hyphens) were filtered as well. So for example “e-mail” is a valid vocabulary entry, while “rrc.cvc.uab.es” is a non-word and is not included.
Submission Stage
For the test phase, we will provide a set of test images along with three specific lists of words for each test image that comprise:
Strongly Contextualised: per-image vocabularies of 100 words including all words (3 characters or longer, only letters) that appear in the image as well as a number of distractor words chosen at random from the same subset test following the setup of Wang et al [4],
Weakly Contextualised: all words (3 characters or longer, only letters) that appear in the entire test set, and
Generic: any vocabulary can be used, a 90k word vocabulary is provided
For each of the above variants, participants can make use of the corresponding vocabulary given to guide the end-to-end word detection and recognition process.
Participants will be able to submit end-to-end results for these variants in a single submission step. Variant (1) will be obligatory, while variants (2) and (3) optional.
Along with the submission of results, participants will have the option to submit the corresponding executable binary file (Windows, Linux or Mac executable). This optional binary file can be added to the submission at a later time (there is no need to delay the submission of results). The executable of the method will be used over a hidden test subset to further analyse the method and provide insight to the authors. The ownership of the file remains with the authors, and the organisers of the competition will keep the executable private and will not make use of the executable in any way unrelated to the competition. The executable should be:
Windows, linux, Mac executable
Compiled for single core architectures
Have no external dependencies (statically linked, or all libraries given)
Command line, no graphical interface
In Parameters: vocabulary filename (e.g. images/img.txt), image filename (e.g. images/img.png)
Output: text file of results for the image same format as the submission called out.txt
4.17.2.5.Evaluation
The evaluation protocol proposed by Wang 2011 [4] will be used which considers a detection as a match if it overlaps a ground truth bounding box by more than 50% (as in [5]) and the words match, ignoring the case. Detecting or missing words marked as “do not care” will not affect (positively or negatively) the results. Any detections overlapping more than 50% with “do not care” ground truth regions will be discarded from the submitted results before evaluation takes place, and evaluation will not take into account ground truth regions marked as “do not care”.
References
C. Wolf and J.M. Jolion, “Object Count / Area Graphs for the Evaluation of Object Detection and Segmentation Algorithms”, International Journal of Document Analysis, vol. 8, no. 4, pp. 280-296, 2006.
A. Clavelli, D. Karatzas, and J. Llados, “A Framework for the Assessment of Text Extraction Algorithms on Complex Colour Images”, in Proceedings of the 9th IAPR Workshop on Document Analysis Systems, Boston, MA, 2010, pp. 19-28.
K. Ntirogiannis, B. Gatos, and I. Pratikakis, “An Objective Methodology for Document Image Binarization Techniques”, in Proceedings of the 8th International Workshop on Document Analysis Systems, Nara, Japan, 2008, pp. 217-224
K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition”, in Computer Vision (ICCV), 2011 IEEE International Conference on (pp. 1457-1464), IEEE, November 2011
M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, (2014). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98-136.

4.17.3.downloads
5.Downloads - Born-Digital Images (Web and Email)
Download below the training dataset and associated ground truth information for each of the Tasks. Task 1.4 is new for the 2015 edition.
5.1.Task 1.1: Text Localization (2013 edition)
5.1.1.Training set

Training set Images (33Mb). - 410 images that comprise the training dataset.


Training Set Text Localization Ground Truth (88Kb). - 410 Text files (one per image) as explained in the “Tasks” section.

5.1.2.Test Set

Test Set Images (5.6Mb). - 141 images that comprise the test set for tasks 1.1, 1.2 and 1.4. You can submit your results for this Task over the images of the test set through the My Methods section.


Test Set Ground Truth (40Kb). - 141 text files with text localisation bounding boxes for the images of the test set.

5.2.Task 1.2: Text Segmentation (2013 edition)
5.2.1.Training Set

Training Set Images (33Mb). - 410 images that comprise the training dataset. This is the same dataset as for Task 1.1.


Training Set Text Segmentation Ground Truth (943Kb). - 410 colour coded images as explained in the “Tasks” section.

5.2.2.Test Set

Test Set Images (5.6Mb). - 141 images that comprise the test set for tasks 1.1, 1.2 and 1.4. You can submit your results for this Task over the images of the test set through the My Methods section.


Test Set Ground Truth (377Kb). - 141 colour coded image corresponding to the images of the test set. Each colour marks a different atom - white is background.

5.3.Task 1.3: Word Recognition (2013 edition)
5.3.1.Training Set

Training Set Word Images and Ground Truth (12Mb). - 3564 images of words cut from the original images and a single text file with the ground truth transcription of all images as specified in the “Tasks” section.

5.3.2.Test Set

Test Set Word Images (4.6Mb). - 1439 images that comprise the word recognition test set. You can submit your results for this Task over the images of the test set through the My Methods section.


Test Set Ground Truth (34Kb). - A single text file with the transcriptions of the 1439 images of the test set. Each line corresponds to an image of the test set.

5.4.Task 1.4: End to End (2015 edition)
5.4.1.Training Set

Training set Images (33MB). - 410 images that comprise the training dataset.


Training Set Text Localization and Transcription Ground Truth (118KB). - 410 Text files (one per image). Each line corresponds to one word and comprises the coordinates of the four corners of the bounding box given in a clockwise order in a comma separated list, and the transcription following the eighth comma.


Training vocabularies per image (214KB). - Vocabularies of 100 words per image, comprising the words appearing in the image plus distractors.


Training set vocabulary (12KB). - Vocabulary of all words (words of 3 characters or longer comprising only letters) appearing in the training set.

5.4.2.Test Set

Test Set Images (5.6Mb). - 141 images that comprise the test set for tasks 1.1, 1.2 and 1.4. You can submit your results for this Task over the images of the test set through the My Methods section.


Test vocabularies per image (75KB). - Vocabularies of 100 words per image, comprising the words appearing in the image plus distractors.


Test set vocabulary (6KB). - Vocabulary of all words (words of 3 characters or longer comprising only letters) appearing in the test set.

5.4.3.Other

 Generic Vocabulary (796KB).- A vocabulary of about 90k words derived from the dataset publicly available here. Please consult [1,2] for further information as well as the disclaimer in the vocabulary file itself.

5.5.Sample MatLAB Code

Sample MatLAB Code (1Mb). - Sample code in MatLAB illustrating how to read in the training images and ground truth and how to output results for the tasks 1.1, 1.2 and 1.3.

5.6.Terms of Use
The “Born-Digital Images” dataset and corresponding annotations are licensed under a Creative Commons Attribution 4.0 License.
5.7.References
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Synthetic data and artificial neural networks for natural scene text recognition”, arXiv preprint arXiv:1406.2227, 2014
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading Text in the Wild with Convolutional Neural Networks”, arXiv preprint arXiv:1412.1842, 2014

5.7.1.results
5.7.2.my methods
5.7.3.organizers

reference:

https://blog.csdn.net/qq_14845119/article/details/105023984#comments

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章