Fresh from ICCV 2023, researchers from ETH Zurich and Microsoft Mixed Reality & AI Lab bring us a new state-of-the-art feature matching algorithm: LightGlue.
We introduce LightGlue, a deep neural network that learns to match local features across images.
The source code and the article are freely available in GitHub and ArXiv, respectively:
Key Takeaways
LightGlue is a state-of-the-art efficient feature matcher.
It's based on the transformers network architecture.
It builds and improves upon its predecessor: SuperGlue.
It's adaptive: it process less on easy image pairs, and more on challenging ones.
LightGlue is faster than other methods while achieving comparable accuracy.
It's open source and it... just works!
What are Features, Anyway?
Features represent points in an image that are interesting in some way. Either they are easily detectable, very particular or outstanding. These are typically used to find correspondences between two or more images. Take for example the following pair of images:
Features provide a "digital signature" of the point, typically known as a descriptor. A good feature extractor is capable of computing similar descriptors for the same point in different images, regardless of it has different illumination, perspective, scale or if its rotated.
Matching features is useful for a number of applications including panoramic image stitching, robot localization and mapping, 3D scene reconstruction, etc...
Applications of feature extractions and matching. Left: SLAM algorithm, center: image stitching, right: 3D reconstruction and photogrametry.
LightGlue does Feature Matching
LightGlue does not extract the features from the images. LightGlue finds the best match between them, if any. So, provided the features computed for the two images above, the algorithm finds the following match:
Take a moment to appreciate the precision, even with such perspective difference between the images.
And it does it very well, and very fast. Here's how it compares against other state-of-the-art methods:
Note how it exceeds the throughput of all other methods while achieving similar accuracies.
Testing LightGlue
Time for the fun part. I'm going to try out LightGlue's feature matching capabilities to perform homography estimation. This is: given a pair of images, what transformation do I need to perform on one of them to find the best overlap between them. I'll be using:
SuperPoint for feature extraction
LightGlue for feature matching
RANSAC for homography estimation
Everything in the CPU (no GPU processing)
The following pair of images:
Challenging pair of images used for the test
Initialize a new Python virtual environment and install LightGlue and OpenCV.
LightGlue related code is fairly trivial:
From the snippet above, it can be seen that lines 20-21 extract the features using SuperPoint, line 24 finds the matches between them and lines 30-31 finally filters the matching points. The remainder, after that is simply for visualization.
Now let's use these points to find a homography and warp the left image.
In the code continuation above, line 5 uses the points computed previously by LightGlue to estimate an homography. Line 12 applies the warping of the image. The remainder, again, is visualization.
Looking great! As a comparison, here's the same estimation but using SIFT features and FLANN matcher, probably the most popular classical methods.
To be fair, I must admit that I deliberately chose an image pair that I knew these methods struggle with. Also, I didn't perform any fine tuning of the parameters at all. Here's another pair where the classical methods seem to outperform a bit LightGlue:
Pair of results where the classical method outperforms in quality LightGlue. See the defect at the end of the yellow line.
Regardless, still great! Here's the code for the classical method, if you want to try it.