At Datagen, we’re actively investing in academic collaborations and benchmarking research that can further establish the effectiveness of training computer vision algorithms with simulated or synthetic data. Fortunately, a large body of work already exists, exploring the advantages and challenges of this approach. We’ve collected some of them here, for anyone interested in learning more about this rapidly-expanding field of computer vision research.
From researchers at Apple, this paper was the company’s first research paper ever published and was named the best paper at CVPR 2017. The research shows how labeled synthetic data, domain adapted with unlabeled real data, solves eye gaze estimation. One year later, the iPhone 10 came out with eye gaze estimation embedded. It is highly likely that Apple used some of the synthetic data techniques detailed in the paper to develop this feature.
Deepmind & Oxford University researchers detail how they achieved state-of-the-art results on 3D human mesh estimation by training exclusively with synthetic humans. It utilizes pose estimation and optical flow estimation to help bridge the domain gap between real and simulated data.
Nvidia shows that using photorealistic synthetic data can lead to strong results in object detection, 3D pose estimation, and tracking applications of objects in real-world scenes. This is a powerful approach that was demonstrated on a small set of items. Datagen is attacking the next challenge in this area; working on ways to scale this approach to thousands of objects while integrating humans and dynamic motion.
Here, Nvidia presented a purely simulation-based approach to generating realistic grasping positions. The grasp distribution is diverse enough to find successful grasps even after discarding those that violate robot kinematics and collision constraints. The method’s diverse grasp sampling leads to higher success rates in comparison to a state-of-the-art 6- DOF grasp planner. More here about Nvidia’s research efforts in this field.
A Technion research group, based in Israel, lays out how they achieved impressive results in reconstructing 3D face meshes from single face images by using synthetic data.
NVIDIA is clearly a research leader in this field. Here, a team of researchers train an object detection network with synthetic data and prove synthetic data does not necessarily need to be as photorealistic as real data to work as effectively, when using a statistical technique called Domain Randomization. Domain Randomization trains the network on images from many kinds of simulated environments at the same time. While this approach requires a large amount of data – in this case, 100,000 data points – it can be valuable when the domain is hard to mimic. We recommend this paper for its insights into the potential of Domain Randomization methodology.
In this paper, a Stanford University research team achieved impressive results in hand pose estimation by training a network on hand synthetic data. The synthetic hands are infused with textures from real-world hands captured by a style-transfer algorithm that uses a GAN-based architecture. The implications are particularly interesting for the VR/AR industry, where hand pose estimation may be able to eliminate the need for hand-held controls. For more on state-of-the-art hand pose estimation data please check out DataGen’s hands generation capabilities.
Cambridge university researchers, working with a corporate team, teach a car to drive in a cartoon-like simulator. The novel idea was to teach the car to transcribe real-world data into its simulation-based understanding (real2sim) instead of attempting the reverse (sim2real).
We will continue to update and add to this post as new research becomes available. If there are relevant publications that you’d like to share with us, send them our way.