Fig 1. Feature matching for Mount Rushmore achieving 98% accuracy with KNN
In this project two images describing the same object are fed into a SIFT pipeline developed and visualized in MATLAB. Using the groundtruth evaluation code, I was able to gradually improve my image detection and matching algorithm both in its accuracy and efficiency. A couple main strategies include parameter tuning, expansion on textbook subprocedures (adaptive non-maximum suppression) and the introduction of alternative implementation strategies, all of which I will detail below.
get_interest_points
, my corner detection implementation computes the entries of second momement matrix M by taking the x- and y-gradients of the input image and multiplying them creatively followed by a gaussian convolution. The following snipper shows that we can using first-order derivatives to approximate values for second-order derivatives and subsequently the Harris corner response function in a very efficient manner.
% Harris Response Function R = det(M) - alpha*trace(M)^2 is approximated by
Response = (Ixx .* Iyy) - Ixy.^2 - a * (Ixx+Iyy).^2;
% where eigenvalus Ixx, Iyy = squaring input image's first-order derivatives Ix, Iy
% respectively and Ixy = Ix .* Iy. Alpha = 0.04 was found to be most effective.
% To neutralize overdetection of point clusters due to regions of high intensity
% we smooth the derivatives with a gaussian filter in which I fine-tuned its parameters
% window size and sigma radius to maximize detection accuracy
gaussD = fspecial('gaussian', [3 3], 2);
Extra credit:bwconncomp()
to iterate through every connected component to manually evaluate the maximum value of each connected component. imregionalmax
on the response function will globally suppress non-maxima to 0s and bump maximal values to 1, producing a binary image of 0s and 1s.COLFILT()
with a given window size and the handy builtin imregionalmax()
function. However due to underlying space inefficiencies of imregionalmax()
making unnecessary copies of the entire matrix MATLAB throws an exception dealing with ~16GB of matrices. I proceeded to manually implement ANMS with starting window radius = infinity decreasing to a small power of 2. (Powers of 2 is used to maximize computation efficiency) In every iteration the radius decreases by 1) linear steps of 32. 2) division factor of 2, 4. Yet through empirical testing I found this to be very time-consuming with a double for loop and because it took 30 seconds to 5 minutes for Notre Dame, I resorted to a fixed size window described in step 1.Fig 2. Feature detection for Notre Dame with non-maximum suppression. (notice ring near center)
A key stage of the SIFT pipeline is constructing the feature descriptor. My implementation follows the suggested algorithm found in Professor Hays's Lecture slides and the Szeliski textbook. The image had to be padded in all sides to accomodate for the sliding 16x16 window that computes a gradient histogram of 8 bins that describes the dominant gradient of all pixels within a 4x4 subcell. Everything is pretty straightforward and standard here except for fine-tuning the gaussian parameters and some improvements to optimize the descriptor's effectiveness.
Note:
ceil()
instead of round()
when calculating each pixel's orientation contribution (each 45 degrees apart) improves final matching accuracy of Notre Dame from ~88% to 90+%.
This part takes the feature descriptors from the last stage and finds matching between sets of descriptors of the two images by some method of distance evaluation. Calculating the nearest neighbor distance ratio (NNDR) from the textbook was rather straightforward, so I went with the following extensions:
Extra credit:
knnsearch()
to achieve optimal performance. Also tried dsearchn()
, but it only gives x coordinates back and not corresponding y so I had to pivot.proj2.m
I've included a dimension reduction factor variable that makes it very easy to see the differences. Image Pair | Accuracy (NNDR, threshold = 0.85) | Accuracy (KNN matching) |
---|---|---|
91.60% | 66.00% | |
96.54% | 98.00% | |
50.00% | 7.00% |
Image | NNDR | NNDR+PCA (dim=64) | NNDR+PCA (dim=32) | KNN | KNN+PCA (dim=64) | KNN+PCA (dim=32) |
---|---|---|---|---|---|---|
Notre Dame | 16.62 secs | 18.50 secs | 19.85 secs | 10.33 secs | 9.93 secs | 9.69 secs |
Mount Rushmore | 1 min 05.35 secs | 1 min 03.15 secs | 1 min 00.83 secs | 29.45 secs | 28.79 secs | 29.80 secs |
Episcopal Gaudi | 21.43 secs | 19.43 secs | 19.98 secs | 14.90 secs | 13.58 secs | 13.70 secs |
Image | NNDR | NNDR+PCA (dim=64) | NNDR+PCA (dim=32) | KNN | KNN+PCA (dim=64) | KNN+PCA (dim=32) |
---|---|---|---|---|---|---|
Notre Dame | 91.60% | 88.12% | 59.38% | 66.00% | 75.00% | 62.00% |
Mount Rushmore | 96.54% | 83.82% | 72.60% | 98.00% | 98.00% | 95.00% |
Episcopal Gaudi | 50.00% | 20.00% | 5.66% | 7.00% | 6.00% | 7.00% |
Fig 3. (Intermediate Binary) Thresholded Cornerness Detection for Notre Dame
Interesting Findings
- It's particularly amusing to notice some unexpected results. For instance, perhaps because PCA is only applied at the feature matching stage, we don't see a time improvement and in most cases actually observe a worsened time (due to the extra step of dimensionality reduction maybe?) across the board.
-While using the KNN matching strategy excelled at Mount Rushmore, it dropped matching accuracy rates for the other images quite significantly.
- Using KNN definitely yield noticeable if not significant time savings for all images, though as in the case of NNDR matching, applying PCA in our case did not help much.