Hi, we’re Datagen

April 13, 2020 , Datagen

Welcome to our new blog and website. We’re excited to use this platform to share how we’re transforming data acquisition for computer vision development teams.

What’s The Problem

Nowadays, there’s a lot of talk about Computer Vision and its potential impact on a wide range of fields and applications. The ability of cameras and computers to not just see but understand the world around them can transform fields from IoT to smart cars and smart cities to manufacturing. Imagine smart stores that understand what you’re grabbing from the shelf and calculate your purchases in the background. VR and AR technologies without clumsy hand-held controllers, responding more intuitively to your body. Systems that understand emotions and facial expressions, safety mechanisms that stop a car if there’s someone crossing the street, smart security systems that can understand when something’s amiss.

 computer vision

But, despite enormous advances in computer vision research and GPU power (2 of 3 ingredients for an effective computer vision system), transformative technology seems to be arriving slowly and with plenty of serious challenges.

When we started Datagen, we began by asking, “Why? What is holding back progress in this field?” After conversations with engineers and researchers working on human-centered smart computer applications, a theme emerged. Again and again, we heard that a lack of access to data is slowing the pace of development.

Teams are spending too much time, money, and energy on obtaining annotated datasets that, despite the effort required, are still deeply flawed. We call these datasets Manual Data; datapoints are manually captured from the real world and real people and then, for the most part, manually annotated. Here are some of Manual Data’s main problems:

  • Manual Data is slow. You’re reliant on event frequency in the real world or have to collect existing data from disparate sources with complicated privacy or access considerations. It can take a long time and lots of resources to build a sufficiently large and representative dataset, especially if you’re looking for specific edge cases.
  • Manual Data is biased. High-level bias – over- or under-representation of key situations, items, or demographic groups – can limit the effectiveness of systems and is extremely hard to control when you are subject to resource and access limitations. Often you don’t know about these biases until you’ve trained your system.
  • Manual Data is limited. Because annotations have to be added after data collection, and the process is often done manually at scale, it is often inconsistent and lacks 100% ground truth. There are certain data layers – for instance, detailed depth maps – that are impossible to add by hand.

Datagen was founded to provide an alternative to the status quo. Let us explain.

What We Do

Datagen creates high-quality, human-focused Simulated Data to fuel smart computer vision learning algorithms and solve your data bottlenecks. Our data is photorealistic, scalable, high-variance, unbiased, and has all kinds of superpowers. Simulated Data is a subset of Synthetic Data, focused on realistically simulating real world environments and interactions. You can learn more about why we use the term Simulated Data here.

We invite you to explore our site to learn more about the capabilities and solutions offered by our unique technology. In the coming months, we’re excited to share more about our products, detail how we build our technology, explain how we can work together, publish datasets for you to explore, and present the research we’re doing about the efficacy of Simulated Data.

Reach out if you’re interested in learning more or working with us at Datagen. We look forward to hearing from you!