Welcome back to our series on Visual Anomaly Detection! This time we turn our focus to the intricate world of image anomaly detection. Missed our introductory dive? No worries! You can catch up here to see where it all began.
In the area of image anomaly detection, every pixel tells a story, from identifying subtle deviations to detecting glaring inconsistencies, we will explore the methods behind the detection of anomalies within images using deep learning.
Image anomaly detection methods will be categorized following the organization proposed by Deep Industrial Image Anomaly Detection: Survey, as shown in the figure below.
Image Anomaly Detection Categories
Unsupervised Image Anomaly Detection
Unsupervised anomaly detection approaches are those that include only normal samples in the training dataset. The methods in this category can be divided in two categories: feature embedding similarity based methods and reconstruction based methods.
Feature Embedding Based Methods
Embedded similarity methods use deep neural networks to extract features that describe a sample, learn the reference vectors representing normality from a training dataset and identify anomalies by the distance between the embedding vectors of a test image and the learned reference vectors. Often this kind of methods lack interpretability since it is not possible to know which part of the anomalous image is responsible for the anomaly score.
Teacher-Student Architecture
The teacher-student methods select partial layers of a backbone network pre-trained on a large scale dataset, as ResNet, VGG or EfficientNet, and use it as a fixed parameter-teacher.
During training the teacher model guides the student model to learn the characteristics of a normal sample extracting the features that represent them. During inference the features extracted of normal samples from the teacher and student network are comparable but the features extracted for anomalous samples are different.
Comparing the features maps that generated both networks, these methods can generate score maps to determine whether an image is anomalous or not even at pixel level for anomaly localization. The teacher-student methods have variation in the architectures, some use multiple student networks or teacher-students configuration, use final and internal features to generate the maps and different calculations to define the anomaly score.
Teacher-Student Anomaly Detection
(sample images from BTAD dataset)
One-Class Classification
Anomaly detection can be tackled with one class classification(OCC) methods. During training, the OCC maps normal samples to a compact representation and finds a boundary that encompasses the normal sample features, usually called hypersphere, that hopefully will provide a good separation for any abnormal features. During inference, the methods determine if a sample contains an anomaly by determining the relative position of the sample features and the hypersphere.
These classifiers work under the assumption that the training dataset consists only of normal samples, having abnormal samples mixed in the training dataset may produce detrimental effects on the deep models results. Some methods generate abnormal samples artificially and use them during training to improve the accuracy of the boundary.
In the OCC methods predominate Deep SVDD variations and many methods apply saliency detection.
One-Class Classification Anomaly Detection
(sample images from BTAD dataset)
Distribution Map / Density Estimation
During training, distribution-map based methods map the input normal images or features to a probability distribution and then, during testing, they judge whether the sample is normal or abnormal by estimating the deviation of the distribution or the likelihood probability of the sample against the established distribution. These methods are pretty similar to OCC-based methods, the difference consists of OCC-based methods attempting to find the feature boundaries while the distribution map-based methods try to map the features into a desired distribution. Distribution-map based methods require a suitable mapping mechanism for training, a bad choice of mapping method may impact the resultant model performance.
In the Distribution Map methods predominate Normalizing Flow(NF) variations because of its strong mapping ability and good performance in anomaly detection tasks. Normalizing Flows are neural networks that are able to learn transformations between data distributions and well defined densities, its main characteristics are the invertible mappings that allow evaluation in both directions. For anomaly detection NF methods extract normal image features from a pre-trained model such as ResNet or Swin Transformer and map the feature distribution as a Gaussian Distribution.
Distribution Map Anomaly Detection
(sample images from BTAD dataset)
Memory Bank
Memory bank methods rely on robust pre-trained networks and require additional memory space to store the image features. These methods required minimal network training and only required sampling or mapping of the collected features for inference. During inference the features of the sample image are compared to the features in the memory bank, an image is considered abnormal if the spatial distance between the test features and the nearest normal features in the memory bank is larger than a threshold, otherwise if the distance is small it is considered normal.
There exist methods that use multi-resolution features based on KNN, others use multivariate Gaussian distribution to construct a probabilistic representation of the normal class, others transform standard characteristics into multiple independent multivariate Gaussian clustering. On the other hand, PatchCore introduces a core-set sampling method to build the memory bank, and CFA improves it by distributing the image features on a hypersphere. And many more methods to improve the ability to represent, distribute and map the data characteristics.
Memory Bank Image Anomaly Detection
(sample images from VisA dataset)
Reconstruction Based Methods
In reconstruction based methods neural network architectures like auto encoders(AE), variational autoencoders(VAE) or generative adversarial networks(GAN) are trained to reconstruct normal images so anomalous images can be identified as those that are not well reconstructed. For these methods the reconstruction error can be used as the anomaly score but some have combined it with additional information from latent space and intermediate activations to improve the anomaly detection score.
A weakness of these approaches is that anomaly samples may be reconstructed with a small error thanks to the great generalization ability of the deep models and subtle changes in the anomaly images. The majority of reconstruction methods only differ in the construction of the neural network.
According to Deep Industrial Image Anomaly Detection: Survey reconstruction-based methods have better performance than feature-embedding methods at the pixel level due to their ability to identify anomalies throughout pixel comparison but have an inferior performance to image-level since mostly are training from scratch without employing robust pre-trained models.
Representation of Reconstruction based Models for Image Anomaly Detection
(sample images from VisA dataset)
Autoencoder
Autoencoder networks are the commonly used method for the anomaly detection in the reconstruction category. The method's variations change how to determine the difference between the reconstructed image and the original image and how to define the anomaly score. Also, some methods to improve the effectiveness of reconstruction use reconstruction at feature level to take advantage of robust pre-trained networks.
Another technique is to use reconstruction at different scales and patches to take into account detection of anomalies at global structure and in detail. The modification of the autoencoder structure has value to improve the reconstruction capabilities, they have used skipping connections and adding memory modules. On the other hand, another method synthesizes abnormal images and reconstructs them as normal to improve the generalization capacity.
A variant of auto encoders are variational autoencoders (VAE), where the intermediate variables of the neural network are data from a normal distribution. These methods take advantage of the distribution by taking into account the deviation of the intermediate variable from the normal distribution to determine anomaly.
Take into account that the learned feature representations can be biased if the train data include infrequent outliers. Also remember that the objective function is designed for dimension reduction rather than anomaly detection, making generic summarizations that may reconstruct some anomalous samples.
Generative Adversarial Networks
The use of Generative Adversarial Networks(GAN) is also a popular approach for deep anomaly detection. This method targets to learn a latent feature space with a generative network so that the latent space well represents the normality underlying the train data. During inference GAN can reconstruct normal samples from the latent space and differentiate the sample as coming from the true data distribution.
GANs can capture the complexity and variability of real data but have some drawbacks for anomaly detection, such as mode collapse, where the generator produces only a few modes of the data distribution and ignores the rest. Additionally, GANs can be difficult to train, as the generator and the discriminator need to be balanced and synchronized, and the loss function can be non-convex and non-smooth.
Transformer
Although transformers were originally developed for the application of natural language processing, transformers have recently been successfully applied to vision processing and specifically in image anomaly detection.
Weakly-Supervised Image Anomaly Detection
The methods in this category take advantage of the abnormal data that can be collected for a dataset, even-though the amount of samples is small compared to normal samples they can provide better guidelines for the anomaly detection. Following a summary of some of the existing methods.
An approach trains a reinforcement learning-based neural batch sampler to amplify the difference in loss curves between anomalous and non-anomalous regions, based on the assumption that changes in loss values during training can be used to identify abnormal samples as features.
Also, a convolutional adversarial variational autoencoder with guided attention is proposed, an attention expansion loss is used to focus on all normal regions of an image and a complementary guided attention loss is used to minimize the attention map corresponding to abnormal regions of the image while focusing on normal regions. Also, DevNet uses a small number of abnormal samples to realize a fine-grained end to end differentiable learning. A proposal uses Login Inducing Loss (LIS) for training and Abnormality Capturing Module (ACM) to characterize anomalous features.
A research of the influence of image-level supervision, pixel level supervision and mixed supervision on surface defect detection tasks find that small number of pixel-level annotations can help the model to achieve a better detection performance comparable to full supervision results.
For a more detailed overview of weakly-supervised learning you can refer to our article: Less Labels, More Learning: Weak Supervision
Final Thoughts & What's Next
As we wrap up our exploration of image anomaly detection, we've only scratched the surface of what's possible with the power of deep learning and AI. The journey through the complexities and breakthroughs in detecting anomalies within images has been both enlightening and inspiring.
With this in mind, we're excited to announce that our exploration does not end here. Stay tuned for our upcoming blog, where we will shift our focus from still images to the dynamic world of video. Video anomaly detection presents its own unique challenges and opportunities for innovation, and we can't wait to share our insights and discoveries with you.
The RidgeRun.ai Team!
Comments