What Is Simulated Data?

Your algorithms are only as good as your data

Your algorithms are only as good as your data
Supervised machine learning algorithms are the basis for production-level computer vision solutions. In order to train these algorithms, hundreds of thousands or millions of pieces of labeled data are required.

Data bottlenecks slow down development

Currently, these visual datasets must be gathered from production-similar devices in large quantities, annotated through manual processes, and then cleaned, often manually. The journey from defining data needs to obtaining a functional dataset is long, expensive, biased, and vulnerable to uncertainty and error.

Introducing Simulated Data

Simulated Data is a new type of synthetic data, specifically focused on visual simulations and recreations of the real world. Simulated Data is photorealistic, scalable, and efficient data designed for training and generated with state-of-the-art computer graphics and data generation algorithms. It is highly variable, unbiased, and annotated with perfect consistency and ground truth, shattering the bottlenecks associated with manual gathering and annotation. Our technology combines Latent Space Variation Generation Algorithms (GANs) on 3D Data, Reinforcement Learning Humanoid Motion Algorithms within a Physical Simulator, and Super Rendering Algorithms to generate Simulated Data at scale.

How Our Technology Works

Gather Requirements

To generate tailored Simulated Data, we begin by understanding the computer vision system that needs to be trained. Together, we define key specifications, such as camera lens parameters, lighting parameters, environmental parameters, demographic distributions, and required annotations/metadata.

3D Asset Creation and Variation

The generation process begins with a range of 3D base models, scanned from the real world or created with 3D computer graphics. With advanced machine learning models, we create non-linear latent space representation of these 3D base models, with high-resolution meshes and textures, and semantic metadata. Then, using Generative Adversarial Neural Networks (GANs), we can expand this latent space and sample from it to create a huge number of unique models, building libraries of millions of photorealistic high-variance 3D assets.

Environment Architecture

In order to provide simulated environments at scale, with high variance, we utilize state-of-the-art methods. Semantic graphs derived from the spatial relationships between thousands of item classes allow us to generate 3D environments that are spatially coherent, physically valid, and highly variant. Our asset libraries enable creation of full environments with a wide range of objects and people.

Physics-based Motion & Behavior Synthesis

To simulate the dynamics of the world around us, we’ve developed controllable physics-based machine learning algorithms. These algorithms combine Robotic Motion Control Algorithms, Deep Reinforcement Learning, Analytical Geometry and Physical Simulation to bring our Simulated Data to life. These technologies also enable rich simulation of high-variance motion, with enough control to simulate both common cases as well as edge cases in the data.

Customized Data Rendering with Perfect Annotations

An advanced rendering process based on Domain Specific Neural Rendering and Super Resolution is employed to generate data at scale. This data is created with perfect ground truth and consistency that are impossible to achieve with human-based manual data annotation. 

All of the specifications are integrated, including customized lighting conditions and camera parameters  This allows us to replicate not just environmental conditions, but also to simulate hardware, camera lenses or situations that do not yet exist, providing a way for teams to train their algorithms before their hardware is ready.

Fine Tuning

Innovative Domain Adaptation techniques have been developed to fine tune your data to the real-world. By creating super accurate photorealistic synthetic data and minimizing the domain gap, we’re able to transfer between the simulated domain and real domain with only a few unlabeled samples of real-world collected image data. This allows us to iteratively boost performance with minimal additional input and effort.

What Is Simulated Data?

Your algorithms are only as good as your data

Your algorithms are only as good as your data
Supervised machine learning algorithms are the basis for production-level computer vision solutions. In order to train these algorithms, hundreds of thousands or millions of pieces of labeled data are required.

Data bottlenecks slow down development

Currently, these visual datasets must be gathered from production-similar devices in large quantities, annotated through manual processes, and then cleaned, often manually. The journey from defining data needs to obtaining a functional dataset is long, expensive, biased, and vulnerable to uncertainty and error.

Introducing Simulated Data

Simulated Data is a new type of synthetic data, specifically focused on visual simulations and recreations of the real world. Simulated Data is photorealistic, scalable, and efficient data designed for training and generated with state-of-the-art computer graphics and data generation algorithms. It is highly variable, unbiased, and annotated with perfect consistency and ground truth, shattering the bottlenecks associated with manual gathering and annotation. Our technology combines Latent Space Variation Generation Algorithms (GANs) on 3D Data, Reinforcement Learning Humanoid Motion Algorithms within a Physical Simulator, and Super Rendering Algorithms to generate Simulated Data at scale.

How Our Technology Works

Gather Requirements

To generate tailored Simulated Data, we begin by understanding the computer vision system that needs to be trained. Together, we define key specifications, such as camera lens parameters, lighting parameters, environmental parameters, demographic distributions, and required annotations/metadata.

3D Asset Creation and Variation

The generation process begins with a range of 3D base models, scanned from the real world or created with 3D computer graphics. With advanced machine learning models, we create non-linear latent space representation of these 3D base models, with high-resolution meshes and textures, and semantic metadata. Then, using Generative Adversarial Neural Networks (GANs), we can expand this latent space and sample from it to create a huge number of unique models, building libraries of millions of photorealistic high-variance 3D assets.

Environment Architecture

In order to provide simulated environments at scale, with high variance, we utilize state-of-the-art methods. Semantic graphs derived from the spatial relationships between thousands of item classes allow us to generate 3D environments that are spatially coherent, physically valid, and highly variant. Our asset libraries enable creation of full environments with a wide range of objects and people.

Physics-based Motion & Behavior Synthesis

To simulate the dynamics of the world around us, we’ve developed controllable physics-based machine learning algorithms. These algorithms combine Robotic Motion Control Algorithms, Deep Reinforcement Learning, Analytical Geometry and Physical Simulation to bring our Simulated Data to life. These technologies also enable rich simulation of high-variance motion, with enough control to simulate both common cases as well as edge cases in the data.

Customized Data Rendering with Perfect Annotations

An advanced rendering process based on Domain Specific Neural Rendering and Super Resolution is employed to generate data at scale. This data is created with perfect ground truth and consistency that are impossible to achieve with human-based manual data annotation. 

All of the specifications are integrated, including customized lighting conditions and camera parameters  This allows us to replicate not just environmental conditions, but also to simulate hardware, camera lenses or situations that do not yet exist, providing a way for teams to train their algorithms before their hardware is ready.

Fine Tuning

Innovative Domain Adaptation techniques have been developed to fine tune your data to the real-world. By creating super accurate photorealistic synthetic data and minimizing the domain gap, we’re able to transfer between the simulated domain and real domain with only a few unlabeled samples of real-world collected image data. This allows us to iteratively boost performance with minimal additional input and effort.