第一步:Feature detection
In computer
vision and image
processing the concept of feature detection refers
to methods that aim at computing abstractions of image information and making local decisions at every image point whether there is an image
feature of a given type at that point or not. The resulting features will be subsets of the image domain, often in the form of isolated points, continuous curves
or connected regions.
Common
feature detectors and their classification:
Feature detector | Edge | Corner | Blob |
---|---|---|---|
Canny | X | ||
Sobel | X | ||
Harris & Stephens / Plessey | X | X | |
SUSAN | X | X | |
Shi & Tomasi | X | ||
Level curve curvature | X | ||
FAST | X | X | |
Laplacian of Gaussian | X | X | |
Difference of Gaussians | X | X | |
Determinant of Hessian | X | X | |
MSER | X | ||
PCBR | X | ||
Grey-level blobs | X |
第二步:feature description
After
feature detection, each image is abstracted by several local patches. Feature representation methods deal with how to represent the patches as numerical vectors. These vectors are called feature descriptors. A good descriptor should have the ability to handle
intensity, rotation, scale and affine variations to some extent. One of the most famous descriptors is Scale-invariant
feature transform (SIFT).SIFT converts each patch to 128-dimensional
vector. After this step, each image is a collection of vectors of the same dimension (128 for SIFT), where the order of different vectors is of no importance.
第三步:Codebook
generation
The final step for the BoW model is to convert vector represented patches to "codewords" (analogy to words in text documents), which also produces a "codebook" (analogy to a word dictionary). A codeword can be considered as a representative of several similar patches. One simple method is performing k-means clustering over all the vectors.[5] Codewords are then defined as the centers of the learned clusters. The number of the clusters is the codebook size (analogy to the size of the word dictionary).
Thus, each patch in an image is mapped to a certain codeword through the clustering process and the image can be represented by the histogram of the codewords.