Image Analysis

Image processing

Linear filtering

Non-linear filtering

Feature detection

Feature detectors

Point features
Aperture problem
spatially varying weighting (or window) function: https://en.wikipedia.org/wiki/Window_function
auto-correlation function or surface
Adaptive non-maximal suppression (ANMS).
Measuring repeatability.
Scale invariance
SIFT
Rotational invariance and orientation estimation
- dominant orientation
aggregation window vs detection window
Affine invariance

Feature descriptors

Feature descriptors are used to match keypoints retrieved by feature detection algorithms in different images. When we have extracted descriptors from each feature in at least two images, we begin by selecting a matching strategy (determined by the context, e.g. image stitching, object detecting, etc) that determines which correspondences that are qualified to the next stage for further processing. To perform matching as efficiently as possible the second step is to select performant data structures for the corresponding problem.

Bias and gain normalization (MOPS).
Scale invariant feature transform (SIFT) : https://en.wikipedia.org/wiki/Scale-invariant_feature_transform
PCA-SIFT.
Gradient location-orientation histogram (GLOH).
Steerable filters.
ROC curve
- the use of TP (true positives), FP (false positive), TN (true negatives), and FN (false negatives) when measuring the performance of the matching strategy.
Nearest Neighbor Distance Ratio (NNDR)
Efficient matching
- indexing structure can be used (hash maps)
- multi-dimensional hashing
- locality sensitive hashing
- parameter sensitive hashing
- k-d trees

Verification of matches

When we have gotten matches from the above steps we can use geometric alignment to verify inliers and outliers in the matches.

Random sampling (RANSAC)

Feature tracking

Instead of independently finding features in images and then match them with each other, we could find likely feature locations in the first image and search in these locations in the next images. Commonly used in video tracking applications.

Camera

2D point

x = [x y]

Homogeneous coordinates

Homogeneous coordinates are given on the form

[x y] \to w \cdot ⎣ ⎢ ⎡ x y 1 ⎦ ⎥ ⎤, w \neq = 0

De-homogeneous coordinates

⎣ ⎢ ⎡ x y w ⎦ ⎥ ⎤ \to [x / w y / w]

Basic set of 2D transformations

Basic transformation with homogeneous coordinates.

Translation + rotation

\tilde{x} = [cos θ sin θ - sin θ cos θ t_{x} t_{y}] x

Scaled rotation

\tilde{x} = [a b - b a t_{x} t_{y}] x

Affine

\tilde{x} = [a_{00} a_{10} a_{01} a_{11} a_{02} a_{12}] x

Projective

Projective is also known as homography.

Orthography

Contrary to how projective projection, orthographic projection simply drops the z component of a three-dimensional coordinate to obtain the 2D point.

Homography

Homography is the technique of using projection mapping with the homogeneous coordinates to achieve the transformation. It is written like this

⎣ ⎢ ⎡ \tilde{x} \tilde{y} \tilde{w} ⎦ ⎥ ⎤ = ⎣ ⎢ ⎡ a d g b e h c f i ⎦ ⎥ ⎤ ⎣ ⎢ ⎡ x y w ⎦ ⎥ ⎤

It could also be written like this

\tilde{x} = H x

It is required that

H needs to be invertible
H has 8 Degrees-of-Freedom (DoF)

To apply the transformation, we first transform the coordinates to homogeneous coordinates. Then we use homography on the coordinates. Then we transform the coordinates back by de-homogenzing them.

General intrinsic camera calibration matrix

K = ⎣ ⎢ ⎡ f 00 s α f 0 p_{x} p_{y} 1 ⎦ ⎥ ⎤