The automation industry has long faced challenges in enabling industrial robots to reliably adapt to different environments and scale the use of sophisticated robotic behaviors across applications. From warehouses to fulfillment centers and fabrication shops to factory production lines — vision systems in particular face challenges with varying lighting, workpieces, complex geometries, and difficult-to-detect shiny and translucent materials. Even ML-based vision systems, which typically work well in the specific environments they were trained for, also fail to transfer across environments to similar applications. As a result, robotic skill reusability and scalability in industry is low. To abstract away the problematic variables that pose the most challenges to vision systems today, would remove the need to explicitly train perception models for one-off applications, and generalize the use of robotic skills.
As the computer vision community gathers at SIGGRAPH Asia in Tokyo, the Intrinsic team is excited to unveil a new plenoptic 3D vision system that can simultaneously improve robot efficiency and precision while making it possible for the system to automatically generalize across different application types. For industrial automation, manufacturing and robotics at large — this represents a new frontier in the use of foundation models to develop universally applicable perception skills that can be used to great effect across many industrial applications.
0 / 0
Unlike other AI-enabled vision systems that memorize real training data for very specific conditions, our system uses a proprietary deep learning stereo architecture, trained on millions of ‘scenes’ of synthetic data. This synthetically-learned knowledge, which unlike real world data is cheap and easy to generate and train with, is used to address previously unseen variables in new robotic environments. In short, our system uses synthetic data to enable industrial robots to efficiently navigate unseen environments and processes, as if it were already trained to. You can read more about this research and the specifics of our system in our published papers, hosted here.
The real value of higher-quality 3D reconstructions in robotics
Our vision system, including our deep learning stereo architecture, produces higher-quality 3D reconstructions of a robot’s environment. This architecture fuses data from RGB, infrared, and polarization sensors, resulting in a single, highly detailed depth map. Combined with a sensor and custom firmware that automatically enables capture of HDR (High Dynamic Range) images , allowing for a wider range of light levels than standard sensors, the system can adapt to varying lighting conditions seamlessly. Ambient, dim, spotlight and even changing lighting in an environment are addressed automatically. Compared to traditional stereo camera architecture, our system also reduces the risk of “hallucinations” or errors algorithms might make when creating 3D images. Together, these advancements enable collision-free, more precise and intelligently adaptive robotic behaviors, when it comes to applications in ‘difficult’ environments and involving complex objects.
For industrial applications and workcells, this approach can unlock new value during solution development and operations. Autogenerating immediately usable and efficient robot path and grasp planning code can greatly accelerate robotic solution development. For instance, our point cloud generation is purposefully optimized for superior collision avoidance and navigation, guided by a new industry-standard metric introduced in our paper, Collision avoidance metric for 3D camera evaluation. The quality of perception data, and how efficiently it can be generated and utilized can also greatly affect the cost effectiveness, overall performance and reliability of industrial-grade workcells.
The system also natively supports multicamera and multidevice setups, each of which is equipped with a random dot projector. The exact position and orientation of the cameras in a workcell is addressed by an automated algorithm and not through manual procedure. Working with Google DeepMind, we have integrated an auto-calibration routine to help automate much of the camera calibration process and to keep cameras accurately calibrated as your application runs.
0 / 0
For the objects and parts being handled by the robots, the vision system includes a novel multi-modal simulated training pipeline that fully simulates the physics of light interaction. This captures and processes the polarization of light when it interacts with an object's surface, helping to detect reflective and transparent materials much more easily. Our simulation of polarization allows for generalization of our models to real-world scenarios.
This vision system and set of deep learning and perception capabilities are being built directly into Intrinsic’s platform and Flowstate, alongside other easy-to-combine capabilities like motion planning and sensor-guided skills. Developing robotic applications will become faster, more intuitive and effective for roboticists entering industry to deploy industrial applications.
0 / 0