Understanding the Value of Monumental Technology For Anomaly Detection and More
Computer vision continues to break new ground in the field of artificial intelligence. Just this summer, NVIDIA’s research in autonomous driving won the Computer Vision and Pattern Recognition Grand Challenge for their model’s superior on-the-road performance. Their vehicle bested 400 other entries, while successfully navigating simulated road hazards and scenarios it had not been trained to master.
Soon, autonomous vehicles won’t simply be fodder for news stories about their failures. More autonomous computer vision use cases will be present in our daily lives before we know it. Until then, a brief survey of computer vision up to the present day proves useful to demonstrate how this technology emerged and where we are headed next.
Classical Computer Vision
Classical, or traditional, computer vision in the 1960s and 70s set the stage for human-like artificial intelligence. Sighted humans have the ability to perceive their environment through their eyes, and the data scientists behind computer vision hoped that a computer could acquire similar abilities.
This hope allowed data scientists to unlock a primary concept that still resonates in deep learning research today: breaking down an image into its individual components (or data points) could potentially help a computer to determine the whole.
Traditional computer vision birthed techniques such as edge detection, which applies algorithms to an image to identify where an image’s brightness or color drastically changes. This creates a contrasting edge between the image and the space around it, so the model can identify a shape. However, for this algorithm to work, the texture and shading of the object under analysis must be similar; otherwise, the computer vision algorithm would fail to recognize a pattern, even when it “sees” the same shape twice.
From Computer Vision to Machine Learning
Classical computer vision worked well for decades; however, their performance heavily relied on very specific constraints in the environment. To overcome this, image data was processed in order to be consumed by machine learning algorithms such as clustering or decision trees. These algorithms provided better generalization for image analysis, leading to smarter results.
Despite machine learning providing a new layer of abstraction and generalization to typical computer vision problems, classic ML algorithms were tied up to very specific feature engineering tasks to perform efficiently, which reduced their potential for more complex scenarios.
Farewell to Machine Learning Models: The Deep Learning Era
The shift from traditional machine learning models to the deep learning era that followed was founded on the concept that a computer should be able to detect or classify an object in a more human-like way: through the use of neural networks.
Much like the human brain, a neural network does far more than determine gradients or shapes to identify an object. A neural network is trained on massive amounts of data which it can synthesize and interconnect to demonstrate its understanding.
In 1998, French-American scientist Yann LeCun and his team produced a paper introducing convolutional neural networks (CNNs), used for the identification of handwritten numerals. The CNN developed by LeCun, called LeNet-5, was able to process data in multiple layers (as opposed to the less complex form object detection used by traditional computer vision) to identify various features of handwritten numbers and identify them.
Neural networks are still used in contemporary computer vision applications to effectively detect, classify, or segment parts of images. Deep learning computer vision use cases that depend on neural networks today include the autonomous driving system piloted by NVIDIA, medical imaging tools, anomaly detection for manufacturing processes, and more.
Each step forward, from the first machine learning models, to today’s advanced computer vision use cases, represents advancements that draw computers closer to the ways that human beings perceive, decide, and act.
Where We Are Now: Generative AI
Generalist models or vision language models represent the most recent step forward in computer vision and deep learning in general. Human beings are not only perceptive, but creative beings.
Generative AI imbues computers with the ability to create content: video, audio, graphics, or text. These models rely on advanced computer vision to receive data, such as an image or words, and create new data or content in response. For example, a generative AI model could train a computer vision model without supervision.
Computer vision models with generative AI capabilities are becoming more and more capable of understanding visuals that are highly complex and able to make decisions, or a “response” based on that environment. Further, generative AI can train computer vision models used in autonomous driving vehicles by creating training scenarios that the computer vision model can learn.
Computer Vision’s Advancements For Good Mirror Human Intelligence
With the advent of computer vision applications that benefit human beings, such as scientific progress in the medical field, it is striking to note that the closer these models come to performing helpful and objectively “good” tasks, the closer they come to enacting the hallmarks of human intelligence.
Inherent in the trajectory of AI is a desire to come closer to the human, and, simultaneously, to provide more helpful applications. The Pew Research Foundation determined in a study performed last year that, in several areas, respondents believe that AI “helps, more than it hurts,” particularly in areas related to finding useful services, safety, personal health, and medical care.
It is comforting to remember that machine learning, deep learning, and computer vision’s evolution reveals more than just the growth of technology. Our advancements reveal that what makes us human is inherently aligned with the desire to benefit others.
Comentarios