Your algorithms are only as good as your data
Supervised machine learning algorithms are the basis for production-level computer vision solutions. In order to train these algorithms, hundreds of thousands or millions of pieces of labeled data are required.
Currently, these visual datasets must be gathered from production-similar devices in large quantities, annotated through manual processes, and then cleaned, often manually. The journey from defining data needs to obtaining a functional dataset is long, expensive, biased, and vulnerable to uncertainty and error.
Simulated Data is a new type of synthetic data, specifically focused on visual simulations and recreations of the real world. Simulated Data is photorealistic, scalable, and efficient data designed for training and generated with state-of-the-art computer graphics and data generation algorithms. It is highly variable, unbiased, and annotated with perfect consistency and ground truth, shattering the bottlenecks associated with manual gathering and annotation. Our technology combines Latent Space Variation Generation Algorithms (GANs) on 3D Data, Reinforcement Learning Humanoid Motion Algorithms within a Physical Simulator, and Super Rendering Algorithms to generate Simulated Data at scale.
To generate tailored Simulated Data, we begin by understanding the computer vision system that needs to be trained. Together, we define key specifications, such as camera lens parameters, lighting parameters, environmental parameters, demographic distributions, and required annotations/metadata.
The generation process begins with a range of 3D base models, scanned from the real world or created with 3D computer graphics. With advanced machine learning models, we create non-linear latent space representation of these 3D base models, with high-resolution meshes and textures, and semantic metadata. Then, using Generative Adversarial Neural Networks (GANs), we can expand this latent space and sample from it to create a huge number of unique models, building libraries of millions of photorealistic high-variance 3D assets.
In order to provide simulated environments at scale, with high variance, we utilize state-of-the-art methods. Semantic graphs derived from the spatial relationships between thousands of item classes allow us to generate 3D environments that are spatially coherent, physically valid, and highly variant. Our asset libraries enable creation of full environments with a wide range of objects and people.
To simulate the dynamics of the world around us, we’ve developed controllable physics-based machine learning algorithms. These algorithms combine Robotic Motion Control Algorithms, Deep Reinforcement Learning, Analytical Geometry and Physical Simulation to bring our Simulated Data to life. These technologies also enable rich simulation of high-variance motion, with enough control to simulate both common cases as well as edge cases in the data.
An advanced rendering process based on Domain Specific Neural Rendering and Super Resolution is employed to generate data at scale. This data is created with perfect ground truth and consistency that are impossible to achieve with human-based manual data annotation.
All of the specifications are integrated, including customized lighting conditions and camera parameters This allows us to replicate not just environmental conditions, but also to simulate hardware, camera lenses or situations that do not yet exist, providing a way for teams to train their algorithms before their hardware is ready.
Innovative Domain Adaptation techniques have been developed to fine tune your data to the real-world. By creating super accurate photorealistic synthetic data and minimizing the domain gap, we’re able to transfer between the simulated domain and real domain with only a few unlabeled samples of real-world collected image data. This allows us to iteratively boost performance with minimal additional input and effort.
Your algorithms are only as good as your data
Supervised machine learning algorithms are the basis for production-level computer vision solutions. In order to train these algorithms, hundreds of thousands or millions of pieces of labeled data are required.
Currently, these visual datasets must be gathered from production-similar devices in large quantities, annotated through manual processes, and then cleaned, often manually. The journey from defining data needs to obtaining a functional dataset is long, expensive, biased, and vulnerable to uncertainty and error.
Simulated Data is a new type of synthetic data, specifically focused on visual simulations and recreations of the real world. Simulated Data is photorealistic, scalable, and efficient data designed for training and generated with state-of-the-art computer graphics and data generation algorithms. It is highly variable, unbiased, and annotated with perfect consistency and ground truth, shattering the bottlenecks associated with manual gathering and annotation. Our technology combines Latent Space Variation Generation Algorithms (GANs) on 3D Data, Reinforcement Learning Humanoid Motion Algorithms within a Physical Simulator, and Super Rendering Algorithms to generate Simulated Data at scale.
To generate tailored Simulated Data, we begin by understanding the computer vision system that needs to be trained. Together, we define key specifications, such as camera lens parameters, lighting parameters, environmental parameters, demographic distributions, and required annotations/metadata.
The generation process begins with a range of 3D base models, scanned from the real world or created with 3D computer graphics. With advanced machine learning models, we create non-linear latent space representation of these 3D base models, with high-resolution meshes and textures, and semantic metadata. Then, using Generative Adversarial Neural Networks (GANs), we can expand this latent space and sample from it to create a huge number of unique models, building libraries of millions of photorealistic high-variance 3D assets.
In order to provide simulated environments at scale, with high variance, we utilize state-of-the-art methods. Semantic graphs derived from the spatial relationships between thousands of item classes allow us to generate 3D environments that are spatially coherent, physically valid, and highly variant. Our asset libraries enable creation of full environments with a wide range of objects and people.
To simulate the dynamics of the world around us, we’ve developed controllable physics-based machine learning algorithms. These algorithms combine Robotic Motion Control Algorithms, Deep Reinforcement Learning, Analytical Geometry and Physical Simulation to bring our Simulated Data to life. These technologies also enable rich simulation of high-variance motion, with enough control to simulate both common cases as well as edge cases in the data.
An advanced rendering process based on Domain Specific Neural Rendering and Super Resolution is employed to generate data at scale. This data is created with perfect ground truth and consistency that are impossible to achieve with human-based manual data annotation.
All of the specifications are integrated, including customized lighting conditions and camera parameters This allows us to replicate not just environmental conditions, but also to simulate hardware, camera lenses or situations that do not yet exist, providing a way for teams to train their algorithms before their hardware is ready.
Innovative Domain Adaptation techniques have been developed to fine tune your data to the real-world. By creating super accurate photorealistic synthetic data and minimizing the domain gap, we’re able to transfer between the simulated domain and real domain with only a few unlabeled samples of real-world collected image data. This allows us to iteratively boost performance with minimal additional input and effort.